Hello WOrld?
Hello WOrld?
The table compares several different configurations of an LGBM model (based on tuning metric and included features).
| Model | auc | pr_auc | _f1_micro | _f1_macro | logloss | accuracy | precision_macro | recall_macro | f1_macro | target_f1 | target_recall | target_precision | fbeta_1.5 | fbeta_2.5 | fbeta_4.0 | log_loss | elapsed_time | total_size | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | LGBM_Dart_AUC | 0.79 | 0.28 | 0.73 | 0.56 | 9.74 | 0.73 | 0.58 | 0.72 | 0.56 | 0.30 | 0.70 | 0.19 | 0.38 | 0.51 | 0.61 | 0.54 | 204.60 | 21.87 |
| 1 | LGBM_AUC | 0.79 | 0.28 | 0.74 | 0.57 | 9.54 | 0.74 | 0.58 | 0.72 | 0.57 | 0.30 | 0.69 | 0.19 | 0.38 | 0.51 | 0.60 | 0.53 | 73.70 | 23.59 |
| 2 | LGBM_AUC_All_Features | 0.79 | 0.28 | 0.74 | 0.57 | 9.42 | 0.74 | 0.58 | 0.71 | 0.57 | 0.30 | 0.68 | 0.19 | 0.38 | 0.50 | 0.59 | 0.53 | 57.00 | 39.72 |
| 3 | LGBM_AUC_Base_Features | 0.77 | 0.25 | 0.71 | 0.55 | 10.33 | 0.71 | 0.57 | 0.70 | 0.55 | 0.28 | 0.67 | 0.17 | 0.36 | 0.48 | 0.57 | 0.56 | 20.70 | 21.20 |
| 4 | LGBM_Weighted_LogLoss | 0.76 | 0.24 | 0.73 | 0.56 | 9.61 | 0.73 | 0.57 | 0.69 | 0.56 | 0.28 | 0.64 | 0.18 | 0.36 | 0.47 | 0.56 | 0.54 | 18.80 | 11.85 |
| 5 | Baseline_Only_CreditRatings | 0.73 | 0.20 | 0.68 | 0.52 | 11.63 | 0.68 | 0.55 | 0.67 | 0.52 | 0.25 | 0.65 | 0.15 | 0.32 | 0.45 | 0.55 | 0.61 | 2.30 | 2.14 |
We have selected LGBM_Dart_AUC as our final "production" model. LightGBM be default issues 'GBDT' (Gradient Boosting Decision Tree) as its boosting algorithm. 'DART' (Dropouts meet Multiple Additive Regression Trees
) is an alternativeboosting algorithm variant that helps prevent overfitting and improve model generalization by randomly dropping a fraction of boosting trees during training (similar to dropout in neural networks).
Because DART's process of dropping and adding trees back into the model requires more computational work and longer training periods, additionally the stochastic nature of the dropout process can lead to less consistent performance across different training runs compared to GBDT which might be problematic when tuning.
However, we have chose to use LGBM + DART as our primary model because it provides slightly better performance and it's theoretical advantages (reduce overfitting and better Generalization) outweigh the slower training time (5 fold CV + fitting a model on the full train set only take ~2 minutes).
V:\projects\ppuodz-ML.4.1\shared\graph.py:1057: UserWarning: The figure layout has changed to tight plt.tight_layout()
Overal we can see that while we are able relatively high AUC of 0.78-0.79 the classification performance, is still very poor. Specifically the very large rate of false positives is a concern because it would mean that ~32% of loans in the sample which were not defaulted on would be rejected when using our model.
On the other hand a somewhat large proportion of (~65%) of problematic loans were detected as such, considering the inherent risks of catering to clients with poorly know credit history that's a relative reasonable result.
We'll investigate this further but generally we can see that the model might be useful to Home Credit if they decided to take a more conservative approach to granting loans.
C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\featuretools\entityset\entityset.py:1914: UserWarning: index SK_BUREAU_ID not found in dataframe, creating new integer column warnings.warn( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\featuretools\computational_backends\feature_set_calculator.py:828: FutureWarning: The provided callable <function mean at 0x0000029A7FC24CC0> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead. ).agg(to_agg) C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\featuretools\computational_backends\feature_set_calculator.py:828: FutureWarning: The provided callable <function min at 0x0000029A7FC24400> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead. ).agg(to_agg) C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\featuretools\computational_backends\feature_set_calculator.py:828: FutureWarning: The provided callable <function max at 0x0000029A7FC242C0> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead. ).agg(to_agg)
Appending previous history drop drop_cols_post_proc: 227 after drop_cols_post_proc: 121 Full DS size: 307511
array([0.52224028, 0.38090953, 0.60392955, ..., 0.77732527, 0.17447063,
0.30083482])
101626 0.522240
155435 0.380910
14857 0.603930
123290 0.265638
32502 0.094382
...
292358 0.455475
43225 0.749569
213138 0.777325
138631 0.174471
233831 0.300835
Name: TARGET, Length: 23064, dtype: float64
pandas.core.series.Series
C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\shap\explainers\_tree.py:448: UserWarning: LightGBM binary classifier with TreeExplainer shap values output has changed to a list of ndarray
warnings.warn('LightGBM binary classifier with TreeExplainer shap values output has changed to a list of ndarray')
C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\shap\explainers\_tree.py:448: UserWarning: LightGBM binary classifier with TreeExplainer shap values output has changed to a list of ndarray
warnings.warn('LightGBM binary classifier with TreeExplainer shap values output has changed to a list of ndarray')
To determine interest rates for each loan while accounting for default risk and aiming to maximize total return on a portfolio, we can employ several advanced modeling techniques. Determening appropriate interest rates gennerally involves not only predicting the risk of default but also integrating this risk assessment into pricing strategies that reflect the level of risk associated with each loan.
We'll use a simple Risk-Based Pricing Modelin our calculation:
it directly links the interest rate charged on a loan to the estimated risk of default. The basic steps to build such a model are:
EL = PD × LGDAdditionally more complex models could (and probably should) be used in real world scenarios that would incorporate the market conditions (i.e. base interbank rates, margins offered by competitors etc.).
Optimization Framework: Use an optimization model that calculates the optimal interest rate for each loan type. This model would use inputs from the risk model (PD and LGD) and incorporate constraints like minimum return requirements, risk appetite, and regulatory requirements. Simulation Techniques: Simulate different interest rate scenarios for various risk levels to determine the interest rate that maximizes profit while keeping the default risk within acceptable bounds.
Portfolio Diversification: Assess the risk contribution of each loan type to the overall portfolio and adjust interest rates to achieve desired diversification and risk-return profile. Risk-adjusted Return on Capital (RAROC): Utilize RAROC to evaluate the profitability of a loan, considering the capital at risk. RAROC is used to ensure that the adjusted return (considering risk) meets a certain threshold.
Mean. Model Predicted Probabilities = 39.09%
Actual Portfolio Default Rate = 8.07%
We can't directly use the probabilities output by our model to perform any financial analysis because they are not aligned with the true probabilities, which is necessary for risk-based pricing models.
We'll have to calibrate our probabilities using CalibratedClassifierCV and using Isotonic Regression (A non-parametric approach that provides a piecewise linear calibration)
0.29301075268817206
Mean. Model Calibrated Probabilities = 8.17%
The application_test.csv does not contain data on the interest rate for granted loans which makes it impossible to calculate the "base" rate used by Home Credit. For our example we'll use a base interest rate of 4% (round 12 month Euribor) and a margin of 6%.
We'll set the LGD to a constant value of 0.5. Generally LGD often ranges between 20% to 60% depending on the type of loan and collateral. For unsecured loans, LGD tends to be higher (closer to or exceeding 50%) due to the lack of recoverable assets.
Calibrated Probabilities mean: 8.17% Loss Given Default (LGD): 0.5 Expected Loss (EL): 0.04083048044259347 Interest Rates: 14.083%
[{'label': 'A Grade', 'start': 0.0, 'end': 0.17682922844255478, 'color': 'grey', 'description': 'Default rate up to 1%'}, {'label': 'B Grade', 'start': 0.17682922844255478, 'end': 0.36787168375037776, 'color': (0.996078431372549, 0.8497808535178777, 0.46145328719723183), 'description': 'Default rate up to 3%'}, {'label': 'C Grade', 'start': 0.36787168375037776, 'end': 0.5064617183285625, 'color': (0.9920953479430988, 0.5490657439446367, 0.23418685121107266), 'description': 'Default rate up to 7%'}, {'label': 'D Grade', 'start': 0.5064617183285625, 'end': 0.7781111185306839, 'color': (0.8866897347174163, 0.09956170703575548, 0.11072664359861592), 'description': 'Default rate up to 15%'}, {'label': 'E-F-G Grade', 'start': 0.7781111185306839, 'end': 1.0, 'color': 'red', 'description': 'Default rate up to 100%'}]
C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\sklearn\metrics\_classification.py:1509: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\sklearn\metrics\_classification.py:1509: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\sklearn\metrics\_classification.py:1509: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
The chart shows the performance of the if only individual with stroke Prob. > T are selected. Additionally the overlay indicates the number of people whose predicted P is in an given range. The overlays can be used to selected the most at risk individual based on the probability predicted for them
C:\Users\Paulius\AppData\Local\Temp\ipykernel_24596\4067243830.py:35: DeprecationWarning: DataFrameGroupBy.apply operated on the grouping columns. This behavior is deprecated, and in a future version of pandas the grouping columns will be excluded from the operation. Either pass `include_groups=False` to exclude the groupings or explicitly select the grouping columns after groupby to silence this warning. .apply(local_utils.calculate_metrics)
'Returns and Default Rate Based on Default Probability Model:'
| default_rate | mean_interest_rate | expected_return | |
|---|---|---|---|
| predicted_grades | |||
| A | 0.010 | 0.104 | 0.103 |
| B | 0.030 | 0.114 | 0.111 |
| C | 0.070 | 0.134 | 0.125 |
| D | 0.150 | 0.176 | 0.150 |
| E-F-G | 0.351 | 0.289 | 0.188 |
0.00711064862989941
The final step we can attempt to is to try to calculate the hypothetical real return of Home Credit's loan portfolio if they used our model to employ a more conservative lending strategy (i.e. rejected all loans where the pre-calibrated default probability is > 0.5).
Again this is only an example because we are still using the interest rates calculated using the calibrated default probabilities should it should be replaced with actual interest rates from Home Credit's loan's to make the data actually meaningful.
| Actual | Hypothetical | |
|---|---|---|
| Total Loan Amount | 13921.5362M | 10238.6082M |
| Total Interest Paid | 18.1365M | 14.4314M |
| Total Return % | 0.13% | 0.14% |
| Default Rate | 7.97% | 3.32% |
| Total Loss | 1025.7732M | 167.8826M |
| Losses Avoided | None | 857.8905M |
| Interest Lost | None | -3.7051M |
| Total Applications Accepted | 23064 | 16306 |
That being said Total Loss and Losses Avoided figures are likely to be more accurate (we're using an static LGD of 60% which again should be replaced with the real value from Home Credit's data) than the total return. We can see that we could effectively reduce the total losses by ~80% by ~30% less applications.
We've used LGBM which is relatively complex "blackbox" model which might not be the ideal in loan evaluations and similar tasks because it's hard to objectively explain the specific decisions the model made (based on regulatory or customer related requirements).
However, we believe that we were largely able to overcome this shortcoming through the use of single observation SHAP plots:
they allow us to attribute the impact of specific feature (e.g. credit scores, client income etc.) on the estimated risk which allows to select an appropriate grade, interest rate and decide whether the loan should or should not be approved based on our acceptable risk preferences:
Each plotted line explains a single model prediction.
The chart above visualizes the model predictions process for individual observations (for a subsample of 100 loans). For a single sample, the line charts the path from the base value to the final predicted value. ach feature causes the line to shift up or down. This shift is determined by its SHAP value. indicates a feature pushing the prediction higher, while blue indicates it pushes the prediction lower.
Below we have included some individuals predictions (focusing a the least accurate, most accurate and some random cases in between). In theory these plots can be used as "continious" decision tree and used to explain why a specific loan was accepted or not for regulatory or other purposes.
e.g. Actual Y Value: 1, Predicted Propability: 0.96 indicates that the application had payment difficulties and that our model predicted this outcome with a 96% likelihood.
C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\shap\explainers\_tree.py:448: UserWarning: LightGBM binary classifier with TreeExplainer shap values output has changed to a list of ndarray
warnings.warn('LightGBM binary classifier with TreeExplainer shap values output has changed to a list of ndarray')
Appendix
C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\sklearn\metrics\_scorer.py:548: FutureWarning: The `needs_threshold` and `needs_proba` parameter are deprecated in version 1.4 and will be removed in 1.6. You can either let `response_method` be `None` or set it to `predict` to preserve the same behaviour. warnings.warn(
C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\featuretools\entityset\entityset.py:1914: UserWarning: index SK_BUREAU_ID not found in dataframe, creating new integer column warnings.warn( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\featuretools\computational_backends\feature_set_calculator.py:828: FutureWarning: The provided callable <function min at 0x0000019C5C314400> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead. ).agg(to_agg) C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\featuretools\computational_backends\feature_set_calculator.py:828: FutureWarning: The provided callable <function mean at 0x0000019C5C314CC0> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead. ).agg(to_agg) C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\featuretools\computational_backends\feature_set_calculator.py:828: FutureWarning: The provided callable <function max at 0x0000019C5C3142C0> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead. ).agg(to_agg)
Appending previous history Full DS size: 307511
| Total NaN Values | Proportion NaN (%) | |
|---|---|---|
| PrevRatioRejectedAccepted | 16847 | 5.0 |
This notebooks includes the analysis of selected variables (based on their importance at predicting the target variable) and their relationships. Individual analysis of each variable is available in the EDA_appendices notebook.
| Total NaN Values | Proportion NaN (%) | |
|---|---|---|
| ExtSource2 | 660 | 0.0 |
| ExtSource3 | 60965 | 20.0 |
| ExtSource1 | 173378 | 56.0 |
| AmtGoodsPrice | 278 | 0.0 |
| OwnCarAge | 202929 | 66.0 |
| PrevAmtDownPaymentSum | 16454 | 5.0 |
| AmtAnnuity | 12 | 0.0 |
| MeanbureaudaysCredit | 44020 | 14.0 |
| MeanbureauamtCreditSumDebt | 51380 | 17.0 |
| PrevAvgYieldGroup | 18945 | 6.0 |
| PrevCreditReceivedRequestedDiff | 16454 | 5.0 |
| OccupationType | 96391 | 31.0 |
| PrevRatioRejectedAccepted | 16847 | 5.0 |
| MaxbureaudaysCreditEnddate | 46269 | 15.0 |
| PrevLastLoanGoodsCategory | 16454 | 5.0 |
| MeanbureauamtCreditMaxOverdue | 123625 | 40.0 |
V:\projects\ppuodz-ML.4.1\shared\graph.py:1269: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead. corr = round(corr.applymap(pd.to_numeric), 2)
The TARGET variable (loans with payment difficulties) is most correlated with credit ratings obtained from external sources. The correlation is very weak but still significant.
| Coefficient | P-Value | |
|---|---|---|
| Column | ||
| ExtSource3 | -0.161 | 0.000 |
| ExtSource1 | -0.131 | 0.000 |
| ExtSource2 | -0.128 | 0.000 |
| MeanbureaudaysCredit | 0.093 | 0.000 |
| OccupationType | 0.075 | 0.000 |
| DaysEmployed | 0.074 | 0.000 |
| PrevRatioRejectedAccepted | 0.073 | 0.000 |
| PrevRatioRejectedAccepted_cats | 0.072 | 0.000 |
| PrevRatioRejectedAccepted_cats_2 | 0.072 | 0.000 |
| OrganizationType | 0.069 | 0.000 |
| NameEducationType | 0.067 | 0.000 |
| PrevAmtDownPaymentSum | -0.057 | 0.000 |
| PrevCreditReceivedRequestedDiff | 0.055 | 0.000 |
| DaysBirth | 0.053 | 0.000 |
| PrevLastLoanGoodsCategory | 0.051 | 0.000 |
| OwnCarAge | 0.050 | 0.000 |
| MeanbureauamtCreditSumDebt | 0.049 | 0.000 |
| MeanbureauamtCreditMaxOverdue | 0.044 | 0.000 |
| DaysIdPublish | 0.042 | 0.000 |
| CodeGender | 0.041 | 0.000 |
| PrevAvgYieldGroup | 0.040 | 0.000 |
| FlagDocument3 | 0.039 | 0.000 |
| AmtGoodsPrice | -0.034 | 0.000 |
| MaxbureaudaysCreditEnddate | 0.034 | 0.000 |
| NameFamilyStatus | 0.027 | 0.002 |
| AmtCredit | -0.023 | 0.001 |
`` Because the datatypes of features vary we had to use different methods to measure the strength and significance of each pair:
Chi-Squared Test: Assesses independence between two categorical variables. For bool-bool pairs due to categorical nature.
Point Biserial Correlation: Measures correlation between a binary and a continuous variable. For bool-numerical pairs to account for mixed data types.
Spearman's Rank Correlation: Assesses monotonic relationship between two continuous variables. Used for numerical-numerical pairs (for non-normally distributed data).
Since the Chi-Squared test outputs an unbound statistic/value which can't be directly compared to pointbiserialr or Spearman Rank we have converted them to a Cramér's V: value which is normalized between 0 and 1. This was done to make the values in the matrix more uniform however we must note that Cramér's V and Spearman's correlation coefficients are fundamentally different statistics and generally can't be directly compared.
CategoricalDtype(categories=['< 25% Rejected', '> 25% Rejected', 'All Accepted', 'No Previous App.'], ordered=False, categories_dtype=object)
The chart below shows the relationship between selected categorical variables and loan status. E.g. a significantly higher proportion of loans taken out by males had issues.
CategoricalDtype(categories=['< 25% Rejected', '> 25% Rejected', 'All Accepted', 'No Previous App.'], ordered=False, categories_dtype=object)
The charts below show pairs of numerical and categorical features (including some binned numerical features) that have a signficant relationships and at least a small effect size (eta_squared>0.01) based on the non-parametric Kruskal-Wallis Test (one-way ANOVA on ranks) testing whether samples originate from the same distribution.
*It's similar to the Mann–Whitney U test but allows comparing more than 2 groups
V:\projects\ppuodz-ML.4.1\shared\graph.py:1470: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. grouped = _df.groupby(c)[y_target] V:\projects\ppuodz-ML.4.1\shared\graph.py:1483: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. group_counts = _df.groupby(c).size()
V:\projects\ppuodz-ML.4.1\shared\graph.py:1470: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. grouped = _df.groupby(c)[y_target] V:\projects\ppuodz-ML.4.1\shared\graph.py:1483: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. group_counts = _df.groupby(c).size()
V:\projects\ppuodz-ML.4.1\shared\graph.py:1470: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. grouped = _df.groupby(c)[y_target] V:\projects\ppuodz-ML.4.1\shared\graph.py:1483: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. group_counts = _df.groupby(c).size()
V:\projects\ppuodz-ML.4.1\shared\graph.py:1470: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. grouped = _df.groupby(c)[y_target] V:\projects\ppuodz-ML.4.1\shared\graph.py:1483: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. group_counts = _df.groupby(c).size()
V:\projects\ppuodz-ML.4.1\shared\graph.py:1470: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. grouped = _df.groupby(c)[y_target] V:\projects\ppuodz-ML.4.1\shared\graph.py:1483: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. group_counts = _df.groupby(c).size()
V:\projects\ppuodz-ML.4.1\shared\graph.py:1470: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. grouped = _df.groupby(c)[y_target] V:\projects\ppuodz-ML.4.1\shared\graph.py:1483: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. group_counts = _df.groupby(c).size()
V:\projects\ppuodz-ML.4.1\shared\graph.py:1470: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. grouped = _df.groupby(c)[y_target] V:\projects\ppuodz-ML.4.1\shared\graph.py:1483: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. group_counts = _df.groupby(c).size()
V:\projects\ppuodz-ML.4.1\shared\graph.py:1470: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. grouped = _df.groupby(c)[y_target]
V:\projects\ppuodz-ML.4.1\shared\graph.py:1470: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. grouped = _df.groupby(c)[y_target] V:\projects\ppuodz-ML.4.1\shared\graph.py:1483: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. group_counts = _df.groupby(c).size()
V:\projects\ppuodz-ML.4.1\shared\graph.py:1470: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. grouped = _df.groupby(c)[y_target] V:\projects\ppuodz-ML.4.1\shared\graph.py:1483: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. group_counts = _df.groupby(c).size()
V:\projects\ppuodz-ML.4.1\shared\graph.py:1470: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. grouped = _df.groupby(c)[y_target] V:\projects\ppuodz-ML.4.1\shared\graph.py:1483: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. group_counts = _df.groupby(c).size()
V:\projects\ppuodz-ML.4.1\shared\graph.py:1470: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. grouped = _df.groupby(c)[y_target] V:\projects\ppuodz-ML.4.1\shared\graph.py:1483: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. group_counts = _df.groupby(c).size()
V:\projects\ppuodz-ML.4.1\shared\graph.py:1470: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. grouped = _df.groupby(c)[y_target] V:\projects\ppuodz-ML.4.1\shared\graph.py:1483: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. group_counts = _df.groupby(c).size()
V:\projects\ppuodz-ML.4.1\shared\graph.py:1470: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. grouped = _df.groupby(c)[y_target] V:\projects\ppuodz-ML.4.1\shared\graph.py:1483: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. group_counts = _df.groupby(c).size()
V:\projects\ppuodz-ML.4.1\shared\graph.py:1470: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. grouped = _df.groupby(c)[y_target] V:\projects\ppuodz-ML.4.1\shared\graph.py:1483: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. group_counts = _df.groupby(c).size()
V:\projects\ppuodz-ML.4.1\shared\graph.py:1470: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. grouped = _df.groupby(c)[y_target] V:\projects\ppuodz-ML.4.1\shared\graph.py:1483: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. group_counts = _df.groupby(c).size()
V:\projects\ppuodz-ML.4.1\shared\graph.py:1470: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. grouped = _df.groupby(c)[y_target] V:\projects\ppuodz-ML.4.1\shared\graph.py:1483: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. group_counts = _df.groupby(c).size()
V:\projects\ppuodz-ML.4.1\shared\graph.py:1470: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. grouped = _df.groupby(c)[y_target] V:\projects\ppuodz-ML.4.1\shared\graph.py:1483: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. group_counts = _df.groupby(c).size()
V:\projects\ppuodz-ML.4.1\shared\graph.py:1470: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. grouped = _df.groupby(c)[y_target] V:\projects\ppuodz-ML.4.1\shared\graph.py:1483: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. group_counts = _df.groupby(c).size()
V:\projects\ppuodz-ML.4.1\shared\graph.py:1470: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. grouped = _df.groupby(c)[y_target] V:\projects\ppuodz-ML.4.1\shared\graph.py:1483: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. group_counts = _df.groupby(c).size()
V:\projects\ppuodz-ML.4.1\shared\graph.py:1470: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. grouped = _df.groupby(c)[y_target] V:\projects\ppuodz-ML.4.1\shared\graph.py:1483: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. group_counts = _df.groupby(c).size()
V:\projects\ppuodz-ML.4.1\shared\graph.py:1470: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. grouped = _df.groupby(c)[y_target] V:\projects\ppuodz-ML.4.1\shared\graph.py:1483: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. group_counts = _df.groupby(c).size()
V:\projects\ppuodz-ML.4.1\shared\graph.py:1470: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. grouped = _df.groupby(c)[y_target] V:\projects\ppuodz-ML.4.1\shared\graph.py:1483: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. group_counts = _df.groupby(c).size()
V:\projects\ppuodz-ML.4.1\shared\graph.py:1470: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. grouped = _df.groupby(c)[y_target] V:\projects\ppuodz-ML.4.1\shared\graph.py:1483: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. group_counts = _df.groupby(c).size()
V:\projects\ppuodz-ML.4.1\shared\graph.py:1470: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. grouped = _df.groupby(c)[y_target] V:\projects\ppuodz-ML.4.1\shared\graph.py:1483: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. group_counts = _df.groupby(c).size()
V:\projects\ppuodz-ML.4.1\shared\graph.py:1470: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. grouped = _df.groupby(c)[y_target] V:\projects\ppuodz-ML.4.1\shared\graph.py:1483: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. group_counts = _df.groupby(c).size()
ExtSource1)¶C:\Users\Paulius\AppData\Local\Temp\ipykernel_35252\2151574185.py:16: MatplotlibDeprecationWarning: The get_cmap function was deprecated in Matplotlib 3.7 and will be removed two minor releases later. Use ``matplotlib.colormaps[name]`` or ``matplotlib.colormaps.get_cmap(obj)`` instead.
colors = plt.cm.get_cmap('tab10', 4)
| Coefficient | Standard Error | P-Value | Conf. Interval Lower | Conf. Interval Upper | |
|---|---|---|---|---|---|
| const | 0.600 | 0.040 | 0.0 | 0.521 | 0.680 |
| ExtSource1 | -2.099 | 0.061 | 0.0 | -2.219 | -1.979 |
| ExtSource2 | -1.964 | 0.060 | 0.0 | -2.082 | -1.846 |
| ExtSource3 | -2.779 | 0.062 | 0.0 | -2.902 | -2.657 |
Normalized credit ratings from three sources are inversely related to default risk, with ExtSource3 having the strongest influence. We can see that a basic Logistic model can already provide a reasonably high result (AUC = 0.74). However, we have to note that the results are based on the full training set and are only provided for EDA/feature analysis purposes. Full statistical modelling will be done in further sections.
C:\Users\Paulius\AppData\Local\Temp\ipykernel_35252\3397180186.py:5: FutureWarning:
`shade` is now deprecated in favor of `fill`; setting `fill=True`.
This will become an error in seaborn v0.14.0; please update your code.
sns.kdeplot(data=features_matrix[features_matrix['TARGET'] == 1][col], label=f'{col} - Default', shade=True)
C:\Users\Paulius\AppData\Local\Temp\ipykernel_35252\3397180186.py:6: FutureWarning:
`shade` is now deprecated in favor of `fill`; setting `fill=True`.
This will become an error in seaborn v0.14.0; please update your code.
sns.kdeplot(data=features_matrix[features_matrix['TARGET'] == 0][col], label=f'{col} - No Default', shade=True)
PrevRatioRejectedAccepted_cats All Accepted 190370 > 25% Rejected 66215 < 25% Rejected 34079 No Previous App. 16847 Name: count, dtype: int64
TotalDefaults_cats No Defaults 304114 1 Defaulted Loans 3397 Name: count, dtype: int64
--------------------------------------------------------------------------- KeyError Traceback (most recent call last) File ~\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\pandas\core\indexes\base.py:3805, in Index.get_loc(self, key) 3804 try: -> 3805 return self._engine.get_loc(casted_key) 3806 except KeyError as err: File index.pyx:167, in pandas._libs.index.IndexEngine.get_loc() File index.pyx:196, in pandas._libs.index.IndexEngine.get_loc() File pandas\\_libs\\hashtable_class_helper.pxi:7081, in pandas._libs.hashtable.PyObjectHashTable.get_item() File pandas\\_libs\\hashtable_class_helper.pxi:7089, in pandas._libs.hashtable.PyObjectHashTable.get_item() KeyError: 'TotalDefaults' The above exception was the direct cause of the following exception: KeyError Traceback (most recent call last) Cell In[22], line 1 ----> 1 features_matrix_with_bins["TotalDefaults"].value_counts() File ~\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\pandas\core\frame.py:4102, in DataFrame.__getitem__(self, key) 4100 if self.columns.nlevels > 1: 4101 return self._getitem_multilevel(key) -> 4102 indexer = self.columns.get_loc(key) 4103 if is_integer(indexer): 4104 indexer = [indexer] File ~\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\pandas\core\indexes\base.py:3812, in Index.get_loc(self, key) 3807 if isinstance(casted_key, slice) or ( 3808 isinstance(casted_key, abc.Iterable) 3809 and any(isinstance(x, slice) for x in casted_key) 3810 ): 3811 raise InvalidIndexError(key) -> 3812 raise KeyError(key) from err 3813 except TypeError: 3814 # If we have a listlike key, _check_indexing_error will raise 3815 # InvalidIndexError. Otherwise we fall through and re-raise 3816 # the TypeError. 3817 self._check_indexing_error(key) KeyError: 'TotalDefaults'
The chart below shows the default rate based on whether applicant has previous applied for loans with Home Cred:
No Previous App. - no previous applications for client found (i.e. new clients)
All Accepted - all previous applications were accepted
< 25% Rejected - less than 1/4 applications were rejected
> 25% Rejected - more than 1/4 applications were rejected
Interestingly we can see that while applicants whose previous loans were rejected are significantly more likely to default when finally given a loan previous clients with no failed applications have a higher default risk than new clients.
This likely limits the usefulness of the previous_application table because only a small proportion of clients have previously rejected applications
Full DS size: 307511
'Distribution of Samples'
dict_keys(['LGBM_Dart_AUC_NEW'])
C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\featuretools\entityset\entityset.py:1914: UserWarning: index SK_BUREAU_ID not found in dataframe, creating new integer column warnings.warn( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\featuretools\computational_backends\feature_set_calculator.py:828: FutureWarning: The provided callable <function min at 0x000001D77E574400> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead. ).agg(to_agg) C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\featuretools\computational_backends\feature_set_calculator.py:828: FutureWarning: The provided callable <function max at 0x000001D77E5742C0> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead. ).agg(to_agg) C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\featuretools\computational_backends\feature_set_calculator.py:828: FutureWarning: The provided callable <function mean at 0x000001D77E574CC0> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead. ).agg(to_agg)
Appending previous history
[I 2024-04-30 14:10:40,243] A new study created in memory with name: no-name-4805c4d6-1ec9-4777-906c-84869c0c80c2
drop drop_cols_post_proc: 227
after drop_cols_post_proc: 121
Full DS size: 307511
Tunning: - transformers: transformers: 0
total options: 0
[]
search_grid:
{}
- model: LGBM_Dart_AUC_NEW n_iters=10 with:
{'model__class_weight': ['balanced', None], 'model__objective': ['binary'], 'model__boosting_type': ['gbdt', 'rf', 'dart'], 'model__n_estimators': Range(50, 1000, 50, int), 'model__learning_rate': Range(0.01, 0.3, 0.01, float), 'model__max_depth': Range(3, 11, 1, int), 'model__num_leaves': Range(8, 256, 8, int), 'model__min_gain_to_split': Range(0.0, 15.0, 0.5, float), 'model__min_data_in_leaf': Range(0, 3000, 100, int), 'model__lambda_l1': Range(0, 110, 5, int), 'model__lambda_l2': Range(0, 110, 5, int), 'model__bagging_fraction': Range(0.2, 1.0, 0.1, float), 'model__feature_fraction': Range(0.2, 1.0, 0.1, float), 'model__max_bin': Range(50, 500, 25, int), 'model__drop_rate': Range(0.0, 1.0, 0.025, float)}
Fold: Tuning: n_train=246008, eval_set=61503
Fold: Tuning: n_train=246009, eval_set=61502
Fold: Tuning: n_train=246009, eval_set=61502
Fold: Tuning: n_train=246009, eval_set=61502
Fold: Tuning: n_train=246009, eval_set=61502
Tune: val_score:0.7248, std_test_score:0.00209 train_set_score:0.7308
folds val/train: [0.727, 0.7253, 0.7214, 0.7266, 0.7235] / [0.7308, 0.7311, 0.732, 0.7298, 0.7304], mean fold time: 1.24
C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\sklearn\metrics\_classification.py:1509: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
[I 2024-04-30 14:10:47,163] Trial 0 finished with value: 0.7247591221654156 and parameters: {'model__class_weight': None, 'model__objective': 'binary', 'model__boosting_type': 'rf', 'model__n_estimators': 900, 'model__learning_rate': 0.08, 'model__max_depth': 11, 'model__num_leaves': 112, 'model__min_gain_to_split': 12.5, 'model__min_data_in_leaf': 2700, 'model__lambda_l1': 50, 'model__lambda_l2': 15, 'model__bagging_fraction': 0.4, 'model__feature_fraction': 0.8, 'model__max_bin': 225, 'model__drop_rate': 0.55}. Best is trial 0 with value: 0.7247591221654156.
C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\lightgbm\callback.py:325: UserWarning: Early stopping is not available in dart mode
_log_warning('Early stopping is not available in dart mode')
Fold: Tuning: n_train=246008, eval_set=61503
C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\lightgbm\callback.py:325: UserWarning: Early stopping is not available in dart mode
_log_warning('Early stopping is not available in dart mode')
Fold: Tuning: n_train=246009, eval_set=61502
C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\lightgbm\callback.py:325: UserWarning: Early stopping is not available in dart mode
_log_warning('Early stopping is not available in dart mode')
Fold: Tuning: n_train=246009, eval_set=61502
C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\lightgbm\callback.py:325: UserWarning: Early stopping is not available in dart mode
_log_warning('Early stopping is not available in dart mode')
Fold: Tuning: n_train=246009, eval_set=61502
C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\lightgbm\callback.py:325: UserWarning: Early stopping is not available in dart mode
_log_warning('Early stopping is not available in dart mode')
Fold: Tuning: n_train=246009, eval_set=61502 Tune: val_score:0.7332, std_test_score:0.00291 train_set_score:0.7374 folds val/train: [0.7347, 0.732, 0.729, 0.7378, 0.7323] / [0.7368, 0.7383, 0.7361, 0.7382, 0.7374], mean fold time: 5.41
C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\sklearn\metrics\_classification.py:1509: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
[I 2024-04-30 14:11:14,958] Trial 1 finished with value: 0.733164335671028 and parameters: {'model__class_weight': None, 'model__objective': 'binary', 'model__boosting_type': 'dart', 'model__n_estimators': 500, 'model__learning_rate': 0.01, 'model__max_depth': 4, 'model__num_leaves': 216, 'model__min_gain_to_split': 7.0, 'model__min_data_in_leaf': 1400, 'model__lambda_l1': 60, 'model__lambda_l2': 75, 'model__bagging_fraction': 0.5, 'model__feature_fraction': 0.30000000000000004, 'model__max_bin': 200, 'model__drop_rate': 0.05}. Best is trial 1 with value: 0.733164335671028.
Fold: Tuning: n_train=246008, eval_set=61503 Fold: Tuning: n_train=246009, eval_set=61502 Fold: Tuning: n_train=246009, eval_set=61502 Fold: Tuning: n_train=246009, eval_set=61502 Fold: Tuning: n_train=246009, eval_set=61502 Tune: val_score:0.7266, std_test_score:0.00277 train_set_score:0.7356 folds val/train: [0.7285, 0.7245, 0.7238, 0.7311, 0.7252] / [0.7353, 0.7356, 0.7359, 0.7363, 0.7348], mean fold time: 1.25
C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\sklearn\metrics\_classification.py:1509: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
[I 2024-04-30 14:11:21,928] Trial 2 finished with value: 0.726644314510903 and parameters: {'model__class_weight': None, 'model__objective': 'binary', 'model__boosting_type': 'rf', 'model__n_estimators': 750, 'model__learning_rate': 0.2, 'model__max_depth': 7, 'model__num_leaves': 160, 'model__min_gain_to_split': 10.5, 'model__min_data_in_leaf': 1500, 'model__lambda_l1': 30, 'model__lambda_l2': 20, 'model__bagging_fraction': 0.9000000000000001, 'model__feature_fraction': 0.2, 'model__max_bin': 175, 'model__drop_rate': 0.30000000000000004}. Best is trial 1 with value: 0.733164335671028.
C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\lightgbm\callback.py:325: UserWarning: Early stopping is not available in dart mode
_log_warning('Early stopping is not available in dart mode')
Fold: Tuning: n_train=246008, eval_set=61503
C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\lightgbm\callback.py:325: UserWarning: Early stopping is not available in dart mode
_log_warning('Early stopping is not available in dart mode')
Fold: Tuning: n_train=246009, eval_set=61502
C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\lightgbm\callback.py:325: UserWarning: Early stopping is not available in dart mode
_log_warning('Early stopping is not available in dart mode')
Fold: Tuning: n_train=246009, eval_set=61502
C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\lightgbm\callback.py:325: UserWarning: Early stopping is not available in dart mode
_log_warning('Early stopping is not available in dart mode')
Fold: Tuning: n_train=246009, eval_set=61502
C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\lightgbm\callback.py:325: UserWarning: Early stopping is not available in dart mode
_log_warning('Early stopping is not available in dart mode')
Fold: Tuning: n_train=246009, eval_set=61502 Tune: val_score:0.7681, std_test_score:0.00153 train_set_score:0.8038 folds val/train: [0.7694, 0.7654, 0.7673, 0.7693, 0.769] / [0.8039, 0.8033, 0.8042, 0.8038, 0.8038], mean fold time: 21.81
[I 2024-04-30 14:13:11,758] Trial 3 finished with value: 0.7680802042485764 and parameters: {'model__class_weight': 'balanced', 'model__objective': 'binary', 'model__boosting_type': 'dart', 'model__n_estimators': 350, 'model__learning_rate': 0.08, 'model__max_depth': 9, 'model__num_leaves': 144, 'model__min_gain_to_split': 7.5, 'model__min_data_in_leaf': 700, 'model__lambda_l1': 90, 'model__lambda_l2': 0, 'model__bagging_fraction': 0.6000000000000001, 'model__feature_fraction': 0.4, 'model__max_bin': 375, 'model__drop_rate': 0.6000000000000001}. Best is trial 3 with value: 0.7680802042485764.
Fold: Tuning: n_train=246008, eval_set=61503 Fold: Tuning: n_train=246009, eval_set=61502 Fold: Tuning: n_train=246009, eval_set=61502 Fold: Tuning: n_train=246009, eval_set=61502 Fold: Tuning: n_train=246009, eval_set=61502 Tune: val_score:0.7713, std_test_score:0.00130 train_set_score:0.8122 folds val/train: [0.7734, 0.7696, 0.7704, 0.771, 0.7718] / [0.8116, 0.8118, 0.8124, 0.813, 0.8122], mean fold time: 2.36
[I 2024-04-30 14:13:24,340] Trial 4 finished with value: 0.771269722930656 and parameters: {'model__class_weight': 'balanced', 'model__objective': 'binary', 'model__boosting_type': 'gbdt', 'model__n_estimators': 950, 'model__learning_rate': 0.11, 'model__max_depth': 6, 'model__num_leaves': 224, 'model__min_gain_to_split': 13.0, 'model__min_data_in_leaf': 0, 'model__lambda_l1': 15, 'model__lambda_l2': 80, 'model__bagging_fraction': 0.9000000000000001, 'model__feature_fraction': 0.4, 'model__max_bin': 500, 'model__drop_rate': 0.875}. Best is trial 4 with value: 0.771269722930656.
Fold: Tuning: n_train=246008, eval_set=61503 Fold: Tuning: n_train=246009, eval_set=61502 Fold: Tuning: n_train=246009, eval_set=61502 Fold: Tuning: n_train=246009, eval_set=61502 Fold: Tuning: n_train=246009, eval_set=61502 Tune: val_score:0.7296, std_test_score:0.00302 train_set_score:0.7395 folds val/train: [0.7331, 0.7296, 0.7247, 0.7322, 0.7283] / [0.7379, 0.7399, 0.7407, 0.7397, 0.7391], mean fold time: 1.35
C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\sklearn\metrics\_classification.py:1509: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
[I 2024-04-30 14:13:31,840] Trial 5 finished with value: 0.7295734635518236 and parameters: {'model__class_weight': None, 'model__objective': 'binary', 'model__boosting_type': 'rf', 'model__n_estimators': 600, 'model__learning_rate': 0.17, 'model__max_depth': 11, 'model__num_leaves': 256, 'model__min_gain_to_split': 4.0, 'model__min_data_in_leaf': 300, 'model__lambda_l1': 110, 'model__lambda_l2': 85, 'model__bagging_fraction': 1.0, 'model__feature_fraction': 0.7, 'model__max_bin': 275, 'model__drop_rate': 0.325}. Best is trial 4 with value: 0.771269722930656.
Fold: Tuning: n_train=246008, eval_set=61503 Fold: Tuning: n_train=246009, eval_set=61502 Fold: Tuning: n_train=246009, eval_set=61502 Fold: Tuning: n_train=246009, eval_set=61502 Fold: Tuning: n_train=246009, eval_set=61502 Tune: val_score:0.7702, std_test_score:0.00128 train_set_score:0.8221 folds val/train: [0.7725, 0.7697, 0.7689, 0.7692, 0.7706] / [0.8388, 0.8306, 0.8252, 0.8057, 0.81], mean fold time: 2.27
[I 2024-04-30 14:13:44,003] Trial 6 finished with value: 0.7701777822813336 and parameters: {'model__class_weight': 'balanced', 'model__objective': 'binary', 'model__boosting_type': 'gbdt', 'model__n_estimators': 450, 'model__learning_rate': 0.24000000000000002, 'model__max_depth': 8, 'model__num_leaves': 24, 'model__min_gain_to_split': 4.5, 'model__min_data_in_leaf': 2800, 'model__lambda_l1': 0, 'model__lambda_l2': 70, 'model__bagging_fraction': 0.30000000000000004, 'model__feature_fraction': 1.0, 'model__max_bin': 450, 'model__drop_rate': 0.225}. Best is trial 4 with value: 0.771269722930656.
Fold: Tuning: n_train=246008, eval_set=61503 Fold: Tuning: n_train=246009, eval_set=61502 Fold: Tuning: n_train=246009, eval_set=61502 Fold: Tuning: n_train=246009, eval_set=61502 Fold: Tuning: n_train=246009, eval_set=61502 Tune: val_score:0.7646, std_test_score:0.00148 train_set_score:0.7801 folds val/train: [0.765, 0.7623, 0.7635, 0.7657, 0.7664] / [0.7801, 0.7805, 0.7804, 0.78, 0.7794], mean fold time: 3.33
[I 2024-04-30 14:14:01,738] Trial 7 finished with value: 0.7645755916988988 and parameters: {'model__class_weight': None, 'model__objective': 'binary', 'model__boosting_type': 'gbdt', 'model__n_estimators': 850, 'model__learning_rate': 0.11, 'model__max_depth': 8, 'model__num_leaves': 192, 'model__min_gain_to_split': 14.0, 'model__min_data_in_leaf': 2600, 'model__lambda_l1': 25, 'model__lambda_l2': 105, 'model__bagging_fraction': 0.2, 'model__feature_fraction': 0.9000000000000001, 'model__max_bin': 225, 'model__drop_rate': 0.17500000000000002}. Best is trial 4 with value: 0.771269722930656.
Fold: Tuning: n_train=246008, eval_set=61503 Fold: Tuning: n_train=246009, eval_set=61502 Fold: Tuning: n_train=246009, eval_set=61502 Fold: Tuning: n_train=246009, eval_set=61502 Fold: Tuning: n_train=246009, eval_set=61502 Tune: val_score:0.7633, std_test_score:0.00208 train_set_score:0.7764 folds val/train: [0.7656, 0.7601, 0.7617, 0.764, 0.765] / [0.7759, 0.7759, 0.7766, 0.777, 0.7768], mean fold time: 1.92
[I 2024-04-30 14:14:12,368] Trial 8 finished with value: 0.7632609111571824 and parameters: {'model__class_weight': None, 'model__objective': 'binary', 'model__boosting_type': 'gbdt', 'model__n_estimators': 550, 'model__learning_rate': 0.28, 'model__max_depth': 5, 'model__num_leaves': 240, 'model__min_gain_to_split': 14.5, 'model__min_data_in_leaf': 0, 'model__lambda_l1': 20, 'model__lambda_l2': 90, 'model__bagging_fraction': 1.0, 'model__feature_fraction': 0.6000000000000001, 'model__max_bin': 100, 'model__drop_rate': 0.9}. Best is trial 4 with value: 0.771269722930656.
Fold: Tuning: n_train=246008, eval_set=61503 Fold: Tuning: n_train=246009, eval_set=61502 Fold: Tuning: n_train=246009, eval_set=61502 Fold: Tuning: n_train=246009, eval_set=61502 Fold: Tuning: n_train=246009, eval_set=61502 Tune: val_score:0.7257, std_test_score:0.00200 train_set_score:0.7356 folds val/train: [0.7258, 0.7237, 0.7239, 0.7293, 0.7258] / [0.7357, 0.7354, 0.7359, 0.7356, 0.7351], mean fold time: 2.34
[I 2024-04-30 14:14:25,096] Trial 9 finished with value: 0.7256979936912922 and parameters: {'model__class_weight': None, 'model__objective': 'binary', 'model__boosting_type': 'rf', 'model__n_estimators': 650, 'model__learning_rate': 0.05, 'model__max_depth': 10, 'model__num_leaves': 56, 'model__min_gain_to_split': 7.0, 'model__min_data_in_leaf': 2100, 'model__lambda_l1': 15, 'model__lambda_l2': 5, 'model__bagging_fraction': 0.30000000000000004, 'model__feature_fraction': 0.9000000000000001, 'model__max_bin': 75, 'model__drop_rate': 0.625}. Best is trial 4 with value: 0.771269722930656.
Fold: Tuning: n_train=246008, eval_set=61503 Fold: Tuning: n_train=246009, eval_set=61502 Fold: Tuning: n_train=246009, eval_set=61502 Fold: Tuning: n_train=246009, eval_set=61502 Fold: Tuning: n_train=246009, eval_set=61502 Tune: val_score:0.7638, std_test_score:0.00140 train_set_score:0.7937 folds val/train: [0.7656, 0.7613, 0.7638, 0.7642, 0.7643] / [0.7935, 0.7939, 0.794, 0.7938, 0.7933], mean fold time: 1.91
[I 2024-04-30 14:14:35,466] Trial 10 finished with value: 0.7638478396414092 and parameters: {'model__class_weight': 'balanced', 'model__objective': 'binary', 'model__boosting_type': 'gbdt', 'model__n_estimators': 50, 'model__learning_rate': 0.12, 'model__max_depth': 6, 'model__num_leaves': 88, 'model__min_gain_to_split': 1.0, 'model__min_data_in_leaf': 700, 'model__lambda_l1': 50, 'model__lambda_l2': 45, 'model__bagging_fraction': 0.7, 'model__feature_fraction': 0.5, 'model__max_bin': 500, 'model__drop_rate': 0.9750000000000001}. Best is trial 4 with value: 0.771269722930656.
Fold: Tuning: n_train=246008, eval_set=61503 Fold: Tuning: n_train=246009, eval_set=61502 Fold: Tuning: n_train=246009, eval_set=61502 Fold: Tuning: n_train=246009, eval_set=61502 Fold: Tuning: n_train=246009, eval_set=61502 Tune: val_score:0.7709, std_test_score:0.00211 train_set_score:0.7988 folds val/train: [0.7743, 0.7703, 0.7685, 0.7691, 0.7722] / [0.807, 0.8008, 0.7974, 0.7913, 0.7973], mean fold time: 2.18
[I 2024-04-30 14:14:47,169] Trial 11 finished with value: 0.7708781891661243 and parameters: {'model__class_weight': 'balanced', 'model__objective': 'binary', 'model__boosting_type': 'gbdt', 'model__n_estimators': 300, 'model__learning_rate': 0.23, 'model__max_depth': 3, 'model__num_leaves': 8, 'model__min_gain_to_split': 3.5, 'model__min_data_in_leaf': 2100, 'model__lambda_l1': 0, 'model__lambda_l2': 60, 'model__bagging_fraction': 0.8, 'model__feature_fraction': 1.0, 'model__max_bin': 475, 'model__drop_rate': 0.775}. Best is trial 4 with value: 0.771269722930656.
Fold: Tuning: n_train=246008, eval_set=61503 Fold: Tuning: n_train=246009, eval_set=61502 Fold: Tuning: n_train=246009, eval_set=61502 Fold: Tuning: n_train=246009, eval_set=61502 Fold: Tuning: n_train=246009, eval_set=61502 Tune: val_score:0.7730, std_test_score:0.00203 train_set_score:0.7982 folds val/train: [0.7767, 0.7722, 0.7707, 0.7722, 0.7731] / [0.8062, 0.797, 0.795, 0.7971, 0.7957], mean fold time: 6.12
[I 2024-04-30 14:15:19,006] Trial 12 finished with value: 0.7729736600713923 and parameters: {'model__class_weight': 'balanced', 'model__objective': 'binary', 'model__boosting_type': 'gbdt', 'model__n_estimators': 250, 'model__learning_rate': 0.21000000000000002, 'model__max_depth': 3, 'model__num_leaves': 16, 'model__min_gain_to_split': 0.0, 'model__min_data_in_leaf': 2000, 'model__lambda_l1': 0, 'model__lambda_l2': 50, 'model__bagging_fraction': 0.8, 'model__feature_fraction': 0.5, 'model__max_bin': 375, 'model__drop_rate': 0.8}. Best is trial 12 with value: 0.7729736600713923.
Fold: Tuning: n_train=246008, eval_set=61503 Fold: Tuning: n_train=246009, eval_set=61502 Fold: Tuning: n_train=246009, eval_set=61502 Fold: Tuning: n_train=246009, eval_set=61502 Fold: Tuning: n_train=246009, eval_set=61502 Tune: val_score:0.7673, std_test_score:0.00125 train_set_score:0.7823 folds val/train: [0.7684, 0.7661, 0.7661, 0.7666, 0.7692] / [0.782, 0.7827, 0.7823, 0.7819, 0.7825], mean fold time: 3.38
[I 2024-04-30 14:15:36,908] Trial 13 finished with value: 0.7672730632046433 and parameters: {'model__class_weight': 'balanced', 'model__objective': 'binary', 'model__boosting_type': 'gbdt', 'model__n_estimators': 100, 'model__learning_rate': 0.16, 'model__max_depth': 3, 'model__num_leaves': 184, 'model__min_gain_to_split': 0.0, 'model__min_data_in_leaf': 2000, 'model__lambda_l1': 0, 'model__lambda_l2': 40, 'model__bagging_fraction': 0.8, 'model__feature_fraction': 0.5, 'model__max_bin': 400, 'model__drop_rate': 0.775}. Best is trial 12 with value: 0.7729736600713923.
Fold: Tuning: n_train=246008, eval_set=61503 Fold: Tuning: n_train=246009, eval_set=61502 Fold: Tuning: n_train=246009, eval_set=61502 Fold: Tuning: n_train=246009, eval_set=61502 Fold: Tuning: n_train=246009, eval_set=61502 Tune: val_score:0.7689, std_test_score:0.00186 train_set_score:0.7956 folds val/train: [0.7712, 0.7661, 0.7674, 0.7699, 0.7697] / [0.7959, 0.7955, 0.7953, 0.7962, 0.7952], mean fold time: 2.21
[I 2024-04-30 14:15:48,764] Trial 14 finished with value: 0.7688614622014904 and parameters: {'model__class_weight': 'balanced', 'model__objective': 'binary', 'model__boosting_type': 'gbdt', 'model__n_estimators': 1000, 'model__learning_rate': 0.3, 'model__max_depth': 5, 'model__num_leaves': 80, 'model__min_gain_to_split': 10.0, 'model__min_data_in_leaf': 1100, 'model__lambda_l1': 65, 'model__lambda_l2': 35, 'model__bagging_fraction': 0.8, 'model__feature_fraction': 0.4, 'model__max_bin': 350, 'model__drop_rate': 0.8}. Best is trial 12 with value: 0.7729736600713923.
Fold: Tuning: n_train=246008, eval_set=61503 Fold: Tuning: n_train=246009, eval_set=61502 Fold: Tuning: n_train=246009, eval_set=61502 Fold: Tuning: n_train=246009, eval_set=61502 Fold: Tuning: n_train=246009, eval_set=61502 Tune: val_score:0.7710, std_test_score:0.00142 train_set_score:0.7985 folds val/train: [0.7735, 0.7698, 0.7705, 0.7697, 0.7717] / [0.7988, 0.7982, 0.7994, 0.7984, 0.7977], mean fold time: 1.99
[I 2024-04-30 14:15:59,500] Trial 15 finished with value: 0.7710315875391158 and parameters: {'model__class_weight': 'balanced', 'model__objective': 'binary', 'model__boosting_type': 'gbdt', 'model__n_estimators': 200, 'model__learning_rate': 0.19, 'model__max_depth': 6, 'model__num_leaves': 120, 'model__min_gain_to_split': 10.0, 'model__min_data_in_leaf': 1900, 'model__lambda_l1': 35, 'model__lambda_l2': 60, 'model__bagging_fraction': 0.6000000000000001, 'model__feature_fraction': 0.2, 'model__max_bin': 325, 'model__drop_rate': 1.0}. Best is trial 12 with value: 0.7729736600713923.
C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\lightgbm\callback.py:325: UserWarning: Early stopping is not available in dart mode
_log_warning('Early stopping is not available in dart mode')
Fold: Tuning: n_train=246008, eval_set=61503
C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\lightgbm\callback.py:325: UserWarning: Early stopping is not available in dart mode
_log_warning('Early stopping is not available in dart mode')
Fold: Tuning: n_train=246009, eval_set=61502
C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\lightgbm\callback.py:325: UserWarning: Early stopping is not available in dart mode
_log_warning('Early stopping is not available in dart mode')
Fold: Tuning: n_train=246009, eval_set=61502
C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\lightgbm\callback.py:325: UserWarning: Early stopping is not available in dart mode
_log_warning('Early stopping is not available in dart mode')
Fold: Tuning: n_train=246009, eval_set=61502
C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\lightgbm\callback.py:325: UserWarning: Early stopping is not available in dart mode
_log_warning('Early stopping is not available in dart mode')
Fold: Tuning: n_train=246009, eval_set=61502 Tune: val_score:0.7676, std_test_score:0.00100 train_set_score:0.7938 folds val/train: [0.7684, 0.7662, 0.7665, 0.7681, 0.7687] / [0.7934, 0.7942, 0.7944, 0.7936, 0.7934], mean fold time: 38.04
[I 2024-04-30 14:19:10,473] Trial 16 finished with value: 0.7675725766548119 and parameters: {'model__class_weight': 'balanced', 'model__objective': 'binary', 'model__boosting_type': 'dart', 'model__n_estimators': 300, 'model__learning_rate': 0.13, 'model__max_depth': 4, 'model__num_leaves': 48, 'model__min_gain_to_split': 2.0, 'model__min_data_in_leaf': 1000, 'model__lambda_l1': 10, 'model__lambda_l2': 100, 'model__bagging_fraction': 0.9000000000000001, 'model__feature_fraction': 0.6000000000000001, 'model__max_bin': 425, 'model__drop_rate': 0.42500000000000004}. Best is trial 12 with value: 0.7729736600713923.
Fold: Tuning: n_train=246008, eval_set=61503 Fold: Tuning: n_train=246009, eval_set=61502 Fold: Tuning: n_train=246009, eval_set=61502 Fold: Tuning: n_train=246009, eval_set=61502 Fold: Tuning: n_train=246009, eval_set=61502 Tune: val_score:0.7691, std_test_score:0.00187 train_set_score:0.7993 folds val/train: [0.7723, 0.7666, 0.7685, 0.7687, 0.7696] / [0.7991, 0.7992, 0.8001, 0.7993, 0.7987], mean fold time: 1.62
[I 2024-04-30 14:19:19,385] Trial 17 finished with value: 0.7691357518070465 and parameters: {'model__class_weight': 'balanced', 'model__objective': 'binary', 'model__boosting_type': 'gbdt', 'model__n_estimators': 750, 'model__learning_rate': 0.25, 'model__max_depth': 6, 'model__num_leaves': 176, 'model__min_gain_to_split': 11.5, 'model__min_data_in_leaf': 2400, 'model__lambda_l1': 40, 'model__lambda_l2': 75, 'model__bagging_fraction': 0.7, 'model__feature_fraction': 0.4, 'model__max_bin': 300, 'model__drop_rate': 0.7000000000000001}. Best is trial 12 with value: 0.7729736600713923.
Fold: Tuning: n_train=246008, eval_set=61503 Fold: Tuning: n_train=246009, eval_set=61502 Fold: Tuning: n_train=246009, eval_set=61502 Fold: Tuning: n_train=246009, eval_set=61502 Fold: Tuning: n_train=246009, eval_set=61502 Tune: val_score:0.7712, std_test_score:0.00147 train_set_score:0.7943 folds val/train: [0.7729, 0.7687, 0.7706, 0.7714, 0.7724] / [0.7932, 0.7941, 0.796, 0.7939, 0.7941], mean fold time: 2.13
[I 2024-04-30 14:19:31,234] Trial 18 finished with value: 0.7712175359581087 and parameters: {'model__class_weight': 'balanced', 'model__objective': 'binary', 'model__boosting_type': 'gbdt', 'model__n_estimators': 200, 'model__learning_rate': 0.19, 'model__max_depth': 4, 'model__num_leaves': 216, 'model__min_gain_to_split': 8.0, 'model__min_data_in_leaf': 1500, 'model__lambda_l1': 75, 'model__lambda_l2': 45, 'model__bagging_fraction': 0.9000000000000001, 'model__feature_fraction': 0.5, 'model__max_bin': 500, 'model__drop_rate': 0.875}. Best is trial 12 with value: 0.7729736600713923.
C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\lightgbm\callback.py:325: UserWarning: Early stopping is not available in dart mode
_log_warning('Early stopping is not available in dart mode')
Fold: Tuning: n_train=246008, eval_set=61503
C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\lightgbm\callback.py:325: UserWarning: Early stopping is not available in dart mode
_log_warning('Early stopping is not available in dart mode')
Fold: Tuning: n_train=246009, eval_set=61502
C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\lightgbm\callback.py:325: UserWarning: Early stopping is not available in dart mode
_log_warning('Early stopping is not available in dart mode')
Fold: Tuning: n_train=246009, eval_set=61502
C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\lightgbm\callback.py:325: UserWarning: Early stopping is not available in dart mode
_log_warning('Early stopping is not available in dart mode')
Fold: Tuning: n_train=246009, eval_set=61502
C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\lightgbm\callback.py:325: UserWarning: Early stopping is not available in dart mode
_log_warning('Early stopping is not available in dart mode')
Fold: Tuning: n_train=246009, eval_set=61502 Tune: val_score:0.7732, std_test_score:0.00156 train_set_score:0.8209 folds val/train: [0.7756, 0.7711, 0.7719, 0.7735, 0.7739] / [0.8221, 0.8203, 0.8207, 0.8213, 0.8203], mean fold time: 25.83
[I 2024-04-30 14:21:41,235] Trial 19 finished with value: 0.7732021037088023 and parameters: {'model__class_weight': 'balanced', 'model__objective': 'binary', 'model__boosting_type': 'dart', 'model__n_estimators': 400, 'model__learning_rate': 0.14, 'model__max_depth': 7, 'model__num_leaves': 88, 'model__min_gain_to_split': 13.5, 'model__min_data_in_leaf': 0, 'model__lambda_l1': 10, 'model__lambda_l2': 90, 'model__bagging_fraction': 0.7, 'model__feature_fraction': 0.30000000000000004, 'model__max_bin': 425, 'model__drop_rate': 0.45}. Best is trial 19 with value: 0.7732021037088023.
C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\lightgbm\callback.py:325: UserWarning: Early stopping is not available in dart mode
_log_warning('Early stopping is not available in dart mode')
Fold: Tuning: n_train=246008, eval_set=61503
C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\lightgbm\callback.py:325: UserWarning: Early stopping is not available in dart mode
_log_warning('Early stopping is not available in dart mode')
Fold: Tuning: n_train=246009, eval_set=61502
C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\lightgbm\callback.py:325: UserWarning: Early stopping is not available in dart mode
_log_warning('Early stopping is not available in dart mode')
Fold: Tuning: n_train=246009, eval_set=61502
C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\lightgbm\callback.py:325: UserWarning: Early stopping is not available in dart mode
_log_warning('Early stopping is not available in dart mode')
Fold: Tuning: n_train=246009, eval_set=61502
C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\lightgbm\callback.py:325: UserWarning: Early stopping is not available in dart mode
_log_warning('Early stopping is not available in dart mode')
Fold: Tuning: n_train=246009, eval_set=61502 Tune: val_score:0.7737, std_test_score:0.00150 train_set_score:0.8402 folds val/train: [0.7753, 0.7722, 0.7717, 0.7744, 0.7751] / [0.8404, 0.8396, 0.8411, 0.8404, 0.8397], mean fold time: 19.83
[I 2024-04-30 14:23:21,211] Trial 20 finished with value: 0.773736619748256 and parameters: {'model__class_weight': 'balanced', 'model__objective': 'binary', 'model__boosting_type': 'dart', 'model__n_estimators': 400, 'model__learning_rate': 0.22, 'model__max_depth': 8, 'model__num_leaves': 40, 'model__min_gain_to_split': 5.0, 'model__min_data_in_leaf': 3000, 'model__lambda_l1': 10, 'model__lambda_l2': 110, 'model__bagging_fraction': 0.5, 'model__feature_fraction': 0.30000000000000004, 'model__max_bin': 425, 'model__drop_rate': 0.45}. Best is trial 20 with value: 0.773736619748256.
C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\lightgbm\callback.py:325: UserWarning: Early stopping is not available in dart mode
_log_warning('Early stopping is not available in dart mode')
Fold: Tuning: n_train=246008, eval_set=61503
C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\lightgbm\callback.py:325: UserWarning: Early stopping is not available in dart mode
_log_warning('Early stopping is not available in dart mode')
Fold: Tuning: n_train=246009, eval_set=61502
C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\lightgbm\callback.py:325: UserWarning: Early stopping is not available in dart mode
_log_warning('Early stopping is not available in dart mode')
Fold: Tuning: n_train=246009, eval_set=61502
C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\lightgbm\callback.py:325: UserWarning: Early stopping is not available in dart mode
_log_warning('Early stopping is not available in dart mode')
Fold: Tuning: n_train=246009, eval_set=61502
C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\lightgbm\callback.py:325: UserWarning: Early stopping is not available in dart mode
_log_warning('Early stopping is not available in dart mode')
Fold: Tuning: n_train=246009, eval_set=61502 Tune: val_score:0.7745, std_test_score:0.00123 train_set_score:0.8443 folds val/train: [0.776, 0.7732, 0.7729, 0.7746, 0.7756] / [0.8443, 0.843, 0.844, 0.8456, 0.8444], mean fold time: 19.61
[I 2024-04-30 14:25:00,094] Trial 21 finished with value: 0.7744710826089047 and parameters: {'model__class_weight': 'balanced', 'model__objective': 'binary', 'model__boosting_type': 'dart', 'model__n_estimators': 450, 'model__learning_rate': 0.23, 'model__max_depth': 8, 'model__num_leaves': 48, 'model__min_gain_to_split': 5.5, 'model__min_data_in_leaf': 2900, 'model__lambda_l1': 10, 'model__lambda_l2': 110, 'model__bagging_fraction': 0.5, 'model__feature_fraction': 0.30000000000000004, 'model__max_bin': 400, 'model__drop_rate': 0.42500000000000004}. Best is trial 21 with value: 0.7744710826089047.
C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\lightgbm\callback.py:325: UserWarning: Early stopping is not available in dart mode
_log_warning('Early stopping is not available in dart mode')
Fold: Tuning: n_train=246008, eval_set=61503
C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\lightgbm\callback.py:325: UserWarning: Early stopping is not available in dart mode
_log_warning('Early stopping is not available in dart mode')
Fold: Tuning: n_train=246009, eval_set=61502
C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\lightgbm\callback.py:325: UserWarning: Early stopping is not available in dart mode
_log_warning('Early stopping is not available in dart mode')
Fold: Tuning: n_train=246009, eval_set=61502
C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\lightgbm\callback.py:325: UserWarning: Early stopping is not available in dart mode
_log_warning('Early stopping is not available in dart mode')
Fold: Tuning: n_train=246009, eval_set=61502
C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\lightgbm\callback.py:325: UserWarning: Early stopping is not available in dart mode
_log_warning('Early stopping is not available in dart mode')
Fold: Tuning: n_train=246009, eval_set=61502 Tune: val_score:0.7732, std_test_score:0.00163 train_set_score:0.8425 folds val/train: [0.7754, 0.7708, 0.7719, 0.7734, 0.7744] / [0.8433, 0.8433, 0.8425, 0.8428, 0.8406], mean fold time: 18.03
[I 2024-04-30 14:26:31,031] Trial 22 finished with value: 0.7731975885382104 and parameters: {'model__class_weight': 'balanced', 'model__objective': 'binary', 'model__boosting_type': 'dart', 'model__n_estimators': 400, 'model__learning_rate': 0.25, 'model__max_depth': 8, 'model__num_leaves': 56, 'model__min_gain_to_split': 5.5, 'model__min_data_in_leaf': 2900, 'model__lambda_l1': 10, 'model__lambda_l2': 110, 'model__bagging_fraction': 0.5, 'model__feature_fraction': 0.30000000000000004, 'model__max_bin': 450, 'model__drop_rate': 0.42500000000000004}. Best is trial 21 with value: 0.7744710826089047.
C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\lightgbm\callback.py:325: UserWarning: Early stopping is not available in dart mode
_log_warning('Early stopping is not available in dart mode')
Fold: Tuning: n_train=246008, eval_set=61503
C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\lightgbm\callback.py:325: UserWarning: Early stopping is not available in dart mode
_log_warning('Early stopping is not available in dart mode')
Fold: Tuning: n_train=246009, eval_set=61502
C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\lightgbm\callback.py:325: UserWarning: Early stopping is not available in dart mode
_log_warning('Early stopping is not available in dart mode')
Fold: Tuning: n_train=246009, eval_set=61502
C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\lightgbm\callback.py:325: UserWarning: Early stopping is not available in dart mode
_log_warning('Early stopping is not available in dart mode')
Fold: Tuning: n_train=246009, eval_set=61502
C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\lightgbm\callback.py:325: UserWarning: Early stopping is not available in dart mode
_log_warning('Early stopping is not available in dart mode')
Fold: Tuning: n_train=246009, eval_set=61502 Tune: val_score:0.7736, std_test_score:0.00140 train_set_score:0.8264 folds val/train: [0.7753, 0.7721, 0.7719, 0.7739, 0.7748] / [0.8266, 0.8246, 0.8273, 0.8272, 0.8262], mean fold time: 20.32
[I 2024-04-30 14:28:13,425] Trial 23 finished with value: 0.7736188555317012 and parameters: {'model__class_weight': 'balanced', 'model__objective': 'binary', 'model__boosting_type': 'dart', 'model__n_estimators': 400, 'model__learning_rate': 0.27, 'model__max_depth': 9, 'model__num_leaves': 80, 'model__min_gain_to_split': 5.5, 'model__min_data_in_leaf': 2400, 'model__lambda_l1': 40, 'model__lambda_l2': 95, 'model__bagging_fraction': 0.5, 'model__feature_fraction': 0.30000000000000004, 'model__max_bin': 425, 'model__drop_rate': 0.45}. Best is trial 21 with value: 0.7744710826089047.
C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\lightgbm\callback.py:325: UserWarning: Early stopping is not available in dart mode
_log_warning('Early stopping is not available in dart mode')
Fold: Tuning: n_train=246008, eval_set=61503
C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\lightgbm\callback.py:325: UserWarning: Early stopping is not available in dart mode
_log_warning('Early stopping is not available in dart mode')
Fold: Tuning: n_train=246009, eval_set=61502
C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\lightgbm\callback.py:325: UserWarning: Early stopping is not available in dart mode
_log_warning('Early stopping is not available in dart mode')
Fold: Tuning: n_train=246009, eval_set=61502
C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\lightgbm\callback.py:325: UserWarning: Early stopping is not available in dart mode
_log_warning('Early stopping is not available in dart mode')
Fold: Tuning: n_train=246009, eval_set=61502
C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\lightgbm\callback.py:325: UserWarning: Early stopping is not available in dart mode
_log_warning('Early stopping is not available in dart mode')
Fold: Tuning: n_train=246009, eval_set=61502 Tune: val_score:0.7743, std_test_score:0.00114 train_set_score:0.8227 folds val/train: [0.7754, 0.7741, 0.7725, 0.7737, 0.7756] / [0.8235, 0.8217, 0.8219, 0.8236, 0.8225], mean fold time: 24.82
[I 2024-04-30 14:30:18,349] Trial 24 finished with value: 0.7742569892640564 and parameters: {'model__class_weight': 'balanced', 'model__objective': 'binary', 'model__boosting_type': 'dart', 'model__n_estimators': 500, 'model__learning_rate': 0.28, 'model__max_depth': 9, 'model__num_leaves': 40, 'model__min_gain_to_split': 5.5, 'model__min_data_in_leaf': 3000, 'model__lambda_l1': 40, 'model__lambda_l2': 100, 'model__bagging_fraction': 0.5, 'model__feature_fraction': 0.30000000000000004, 'model__max_bin': 350, 'model__drop_rate': 0.35000000000000003}. Best is trial 21 with value: 0.7744710826089047.
C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\lightgbm\callback.py:325: UserWarning: Early stopping is not available in dart mode
_log_warning('Early stopping is not available in dart mode')
Fold: Tuning: n_train=246008, eval_set=61503
C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\lightgbm\callback.py:325: UserWarning: Early stopping is not available in dart mode
_log_warning('Early stopping is not available in dart mode')
Fold: Tuning: n_train=246009, eval_set=61502
C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\lightgbm\callback.py:325: UserWarning: Early stopping is not available in dart mode
_log_warning('Early stopping is not available in dart mode')
Fold: Tuning: n_train=246009, eval_set=61502
C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\lightgbm\callback.py:325: UserWarning: Early stopping is not available in dart mode
_log_warning('Early stopping is not available in dart mode')
Fold: Tuning: n_train=246009, eval_set=61502
C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\lightgbm\callback.py:325: UserWarning: Early stopping is not available in dart mode
_log_warning('Early stopping is not available in dart mode')
Fold: Tuning: n_train=246009, eval_set=61502 Tune: val_score:0.7730, std_test_score:0.00176 train_set_score:0.8582 folds val/train: [0.776, 0.7708, 0.772, 0.7735, 0.7727] / [0.8573, 0.8581, 0.859, 0.8597, 0.8572], mean fold time: 35.85
[I 2024-04-30 14:33:18,399] Trial 25 finished with value: 0.772998044701975 and parameters: {'model__class_weight': 'balanced', 'model__objective': 'binary', 'model__boosting_type': 'dart', 'model__n_estimators': 700, 'model__learning_rate': 0.3, 'model__max_depth': 9, 'model__num_leaves': 40, 'model__min_gain_to_split': 2.5, 'model__min_data_in_leaf': 3000, 'model__lambda_l1': 25, 'model__lambda_l2': 110, 'model__bagging_fraction': 0.4, 'model__feature_fraction': 0.2, 'model__max_bin': 375, 'model__drop_rate': 0.325}. Best is trial 21 with value: 0.7744710826089047.
C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\lightgbm\callback.py:325: UserWarning: Early stopping is not available in dart mode
_log_warning('Early stopping is not available in dart mode')
Fold: Tuning: n_train=246008, eval_set=61503
C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\lightgbm\callback.py:325: UserWarning: Early stopping is not available in dart mode
_log_warning('Early stopping is not available in dart mode')
Fold: Tuning: n_train=246009, eval_set=61502
C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\lightgbm\callback.py:325: UserWarning: Early stopping is not available in dart mode
_log_warning('Early stopping is not available in dart mode')
Fold: Tuning: n_train=246009, eval_set=61502
C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\lightgbm\callback.py:325: UserWarning: Early stopping is not available in dart mode
_log_warning('Early stopping is not available in dart mode')
Fold: Tuning: n_train=246009, eval_set=61502
C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\lightgbm\callback.py:325: UserWarning: Early stopping is not available in dart mode
_log_warning('Early stopping is not available in dart mode')
Fold: Tuning: n_train=246009, eval_set=61502 Tune: val_score:0.7740, std_test_score:0.00162 train_set_score:0.8036 folds val/train: [0.7761, 0.7716, 0.7726, 0.7746, 0.7748] / [0.8038, 0.8028, 0.804, 0.8039, 0.8036], mean fold time: 19.51
[I 2024-04-30 14:34:56,749] Trial 26 finished with value: 0.7739512652732408 and parameters: {'model__class_weight': 'balanced', 'model__objective': 'binary', 'model__boosting_type': 'dart', 'model__n_estimators': 500, 'model__learning_rate': 0.22, 'model__max_depth': 10, 'model__num_leaves': 32, 'model__min_gain_to_split': 8.5, 'model__min_data_in_leaf': 2500, 'model__lambda_l1': 45, 'model__lambda_l2': 100, 'model__bagging_fraction': 0.4, 'model__feature_fraction': 0.2, 'model__max_bin': 325, 'model__drop_rate': 0.15000000000000002}. Best is trial 21 with value: 0.7744710826089047.
C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\lightgbm\callback.py:325: UserWarning: Early stopping is not available in dart mode
_log_warning('Early stopping is not available in dart mode')
Fold: Tuning: n_train=246008, eval_set=61503
C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\lightgbm\callback.py:325: UserWarning: Early stopping is not available in dart mode
_log_warning('Early stopping is not available in dart mode')
Fold: Tuning: n_train=246009, eval_set=61502
C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\lightgbm\callback.py:325: UserWarning: Early stopping is not available in dart mode
_log_warning('Early stopping is not available in dart mode')
Fold: Tuning: n_train=246009, eval_set=61502
C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\lightgbm\callback.py:325: UserWarning: Early stopping is not available in dart mode
_log_warning('Early stopping is not available in dart mode')
Fold: Tuning: n_train=246009, eval_set=61502
C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\lightgbm\callback.py:325: UserWarning: Early stopping is not available in dart mode
_log_warning('Early stopping is not available in dart mode')
Fold: Tuning: n_train=246009, eval_set=61502 Tune: val_score:0.7738, std_test_score:0.00148 train_set_score:0.7986 folds val/train: [0.7757, 0.772, 0.7726, 0.7734, 0.7754] / [0.7985, 0.7979, 0.7994, 0.7984, 0.7986], mean fold time: 5.78
[I 2024-04-30 14:35:26,433] Trial 27 finished with value: 0.7738168190120491 and parameters: {'model__class_weight': 'balanced', 'model__objective': 'binary', 'model__boosting_type': 'dart', 'model__n_estimators': 550, 'model__learning_rate': 0.27, 'model__max_depth': 10, 'model__num_leaves': 64, 'model__min_gain_to_split': 8.0, 'model__min_data_in_leaf': 2500, 'model__lambda_l1': 75, 'model__lambda_l2': 100, 'model__bagging_fraction': 0.4, 'model__feature_fraction': 0.2, 'model__max_bin': 300, 'model__drop_rate': 0.025}. Best is trial 21 with value: 0.7744710826089047.
C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\lightgbm\callback.py:325: UserWarning: Early stopping is not available in dart mode
_log_warning('Early stopping is not available in dart mode')
Fold: Tuning: n_train=246008, eval_set=61503
C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\lightgbm\callback.py:325: UserWarning: Early stopping is not available in dart mode
_log_warning('Early stopping is not available in dart mode')
Fold: Tuning: n_train=246009, eval_set=61502
C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\lightgbm\callback.py:325: UserWarning: Early stopping is not available in dart mode
_log_warning('Early stopping is not available in dart mode')
Fold: Tuning: n_train=246009, eval_set=61502
C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\lightgbm\callback.py:325: UserWarning: Early stopping is not available in dart mode
_log_warning('Early stopping is not available in dart mode')
Fold: Tuning: n_train=246009, eval_set=61502
C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\lightgbm\callback.py:325: UserWarning: Early stopping is not available in dart mode
_log_warning('Early stopping is not available in dart mode')
Fold: Tuning: n_train=246009, eval_set=61502 Tune: val_score:0.7751, std_test_score:0.00160 train_set_score:0.8100 folds val/train: [0.7774, 0.7736, 0.7732, 0.7748, 0.7763] / [0.8103, 0.8095, 0.81, 0.81, 0.8101], mean fold time: 21.19
[I 2024-04-30 14:37:13,206] Trial 28 finished with value: 0.77509084497967 and parameters: {'model__class_weight': 'balanced', 'model__objective': 'binary', 'model__boosting_type': 'dart', 'model__n_estimators': 600, 'model__learning_rate': 0.26, 'model__max_depth': 10, 'model__num_leaves': 24, 'model__min_gain_to_split': 6.0, 'model__min_data_in_leaf': 2300, 'model__lambda_l1': 45, 'model__lambda_l2': 95, 'model__bagging_fraction': 0.30000000000000004, 'model__feature_fraction': 0.2, 'model__max_bin': 325, 'model__drop_rate': 0.125}. Best is trial 28 with value: 0.77509084497967.
C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\lightgbm\callback.py:325: UserWarning: Early stopping is not available in dart mode
_log_warning('Early stopping is not available in dart mode')
Fold: Tuning: n_train=246008, eval_set=61503
C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\lightgbm\callback.py:325: UserWarning: Early stopping is not available in dart mode
_log_warning('Early stopping is not available in dart mode')
Fold: Tuning: n_train=246009, eval_set=61502
C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\lightgbm\callback.py:325: UserWarning: Early stopping is not available in dart mode
_log_warning('Early stopping is not available in dart mode')
Fold: Tuning: n_train=246009, eval_set=61502
C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\lightgbm\callback.py:325: UserWarning: Early stopping is not available in dart mode
_log_warning('Early stopping is not available in dart mode')
Fold: Tuning: n_train=246009, eval_set=61502
C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\lightgbm\callback.py:325: UserWarning: Early stopping is not available in dart mode
_log_warning('Early stopping is not available in dart mode')
Fold: Tuning: n_train=246009, eval_set=61502 Tune: val_score:0.7750, std_test_score:0.00135 train_set_score:0.8244 folds val/train: [0.7767, 0.7738, 0.7732, 0.7752, 0.7762] / [0.824, 0.8238, 0.8252, 0.8246, 0.8244], mean fold time: 33.81
[I 2024-04-30 14:40:03,043] Trial 29 finished with value: 0.7750172313526884 and parameters: {'model__class_weight': 'balanced', 'model__objective': 'binary', 'model__boosting_type': 'dart', 'model__n_estimators': 850, 'model__learning_rate': 0.28, 'model__max_depth': 11, 'model__num_leaves': 112, 'model__min_gain_to_split': 6.0, 'model__min_data_in_leaf': 2700, 'model__lambda_l1': 55, 'model__lambda_l2': 65, 'model__bagging_fraction': 0.2, 'model__feature_fraction': 0.7, 'model__max_bin': 250, 'model__drop_rate': 0.125}. Best is trial 28 with value: 0.77509084497967.
C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\lightgbm\callback.py:325: UserWarning: Early stopping is not available in dart mode
_log_warning('Early stopping is not available in dart mode')
Fold: Tuning: n_train=246008, eval_set=61503
C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\lightgbm\callback.py:325: UserWarning: Early stopping is not available in dart mode
_log_warning('Early stopping is not available in dart mode')
Fold: Tuning: n_train=246009, eval_set=61502
C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\lightgbm\callback.py:325: UserWarning: Early stopping is not available in dart mode
_log_warning('Early stopping is not available in dart mode')
<module 'Draft.feature_builder_v2' from 'V:\\projects\\ppuodz-ML.4.1\\Draft\\feature_builder_v2.py'>
Full DS size: 307511 -- TEST SIZE: 23064 Baseline_Only_CreditRatings: 2.3 seconds Full DS size: 307511 -- TEST SIZE: 23064 LGBM_AUC_Base_Features: 20.7 seconds drop drop_cols_post_proc: 121 after drop_cols_post_proc: 59 Full DS size: 307511 -- TEST SIZE: 23064 LGBM_Weighted_LogLoss: 18.8 seconds
C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\featuretools\entityset\entityset.py:1914: UserWarning: index SK_BUREAU_ID not found in dataframe, creating new integer column warnings.warn( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\featuretools\computational_backends\feature_set_calculator.py:828: FutureWarning: The provided callable <function min at 0x00000207FE1545E0> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead. ).agg(to_agg) C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\featuretools\computational_backends\feature_set_calculator.py:828: FutureWarning: The provided callable <function mean at 0x00000207FE154EA0> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead. ).agg(to_agg) C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\featuretools\computational_backends\feature_set_calculator.py:828: FutureWarning: The provided callable <function max at 0x00000207FE1544A0> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead. ).agg(to_agg)
Appending previous history drop drop_cols_post_proc: 227 after drop_cols_post_proc: 121 Full DS size: 307511 -- TEST SIZE: 23064 LGBM_AUC: 73.7 seconds
C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\featuretools\entityset\entityset.py:1914: UserWarning: index SK_BUREAU_ID not found in dataframe, creating new integer column warnings.warn( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\featuretools\computational_backends\feature_set_calculator.py:828: FutureWarning: The provided callable <function min at 0x00000207FE1545E0> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead. ).agg(to_agg) C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\featuretools\computational_backends\feature_set_calculator.py:828: FutureWarning: The provided callable <function mean at 0x00000207FE154EA0> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead. ).agg(to_agg) C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\featuretools\computational_backends\feature_set_calculator.py:828: FutureWarning: The provided callable <function max at 0x00000207FE1544A0> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead. ).agg(to_agg)
Appending previous history drop drop_cols_post_proc: 227 after drop_cols_post_proc: 121 Full DS size: 307511 -- TEST SIZE: 23064
C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\lightgbm\callback.py:325: UserWarning: Early stopping is not available in dart mode
_log_warning('Early stopping is not available in dart mode')
C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\lightgbm\callback.py:325: UserWarning: Early stopping is not available in dart mode
_log_warning('Early stopping is not available in dart mode')
C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\lightgbm\callback.py:325: UserWarning: Early stopping is not available in dart mode
_log_warning('Early stopping is not available in dart mode')
C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\lightgbm\callback.py:325: UserWarning: Early stopping is not available in dart mode
_log_warning('Early stopping is not available in dart mode')
C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\lightgbm\callback.py:325: UserWarning: Early stopping is not available in dart mode
_log_warning('Early stopping is not available in dart mode')
C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\lightgbm\callback.py:325: UserWarning: Early stopping is not available in dart mode
_log_warning('Early stopping is not available in dart mode')
LGBM_Dart_AUC: 204.6 seconds
C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\featuretools\entityset\entityset.py:1914: UserWarning: index SK_BUREAU_ID not found in dataframe, creating new integer column warnings.warn( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\woodwork\type_sys\utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format. pd.to_datetime( C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\featuretools\computational_backends\feature_set_calculator.py:828: FutureWarning: The provided callable <function min at 0x00000207FE1545E0> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead. ).agg(to_agg) C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\featuretools\computational_backends\feature_set_calculator.py:828: FutureWarning: The provided callable <function mean at 0x00000207FE154EA0> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead. ).agg(to_agg) C:\Users\Paulius\AppData\Local\pypoetry\Cache\virtualenvs\ppuodz-ml-4-1-dqELbViF-py3.12\Lib\site-packages\featuretools\computational_backends\feature_set_calculator.py:828: FutureWarning: The provided callable <function max at 0x00000207FE1544A0> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead. ).agg(to_agg)
Appending previous history Full DS size: 307511 -- TEST SIZE: 23064 LGBM_AUC_All_Features: 57.0 seconds
Baseline_Only_CreditRatings:
{'model__n_estimators': 300, 'model__learning_rate': 0.25, 'model__max_depth': 3, 'model__num_leaves': 224, 'model__min_gain_to_split': 1.5, 'model__min_data_in_leaf': 100, 'model__lambda_l1': 0, 'model__lambda_l2': 50, 'model__bagging_fraction': 1.0, 'model__feature_fraction': 0.30000000000000004, 'model__max_bin': 125}
LGBM_AUC_Base_Features:
{'model__n_estimators': 800, 'model__learning_rate': 0.060000000000000005, 'model__max_depth': 7, 'model__num_leaves': 48, 'model__min_gain_to_split': 2.5, 'model__min_data_in_leaf': 300, 'model__lambda_l1': 80, 'model__lambda_l2': 5, 'model__bagging_fraction': 0.8, 'model__feature_fraction': 0.8, 'model__max_bin': 350}
LGBM_Weighted_LogLoss:
{'model__n_estimators': 800, 'model__learning_rate': 0.06999999999999999, 'model__max_depth': 10, 'model__num_leaves': 224, 'model__min_gain_to_split': 3.5, 'model__min_data_in_leaf': 100, 'model__lambda_l1': 5, 'model__lambda_l2': 25, 'model__bagging_fraction': 0.4, 'model__feature_fraction': 0.6000000000000001, 'model__max_bin': 50}
LGBM_AUC:
{'model__n_estimators': 700, 'model__learning_rate': 0.04, 'model__max_depth': 9, 'model__num_leaves': 120, 'model__min_gain_to_split': 0.5, 'model__min_data_in_leaf': 400, 'model__lambda_l1': 105, 'model__lambda_l2': 5, 'model__bagging_fraction': 0.30000000000000004, 'model__feature_fraction': 0.30000000000000004, 'model__max_bin': 500}
LGBM_Dart_AUC:
{'model__n_estimators': 900, 'model__learning_rate': 0.27, 'model__max_depth': 4, 'model__num_leaves': 160, 'model__min_gain_to_split': 1.5, 'model__min_data_in_leaf': 2200, 'model__lambda_l1': 85, 'model__lambda_l2': 90, 'model__bagging_fraction': 0.5, 'model__feature_fraction': 1.0, 'model__max_bin': 250, 'model__drop_rate': 0.2}
LGBM_AUC_All_Features:
{'model__n_estimators': 700, 'model__learning_rate': 0.05, 'model__max_depth': 7, 'model__num_leaves': 224, 'model__min_gain_to_split': 4.0, 'model__min_data_in_leaf': 300, 'model__lambda_l1': 40, 'model__lambda_l2': 10, 'model__bagging_fraction': 0.7, 'model__feature_fraction': 0.8, 'model__max_bin': 100}
| Model | auc | pr_auc | _f1_micro | _f1_macro | logloss | accuracy | precision_macro | recall_macro | f1_macro | target_f1 | target_recall | target_precision | fbeta_1.5 | fbeta_2.5 | fbeta_4.0 | log_loss | elapsed_time | total_size | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 4 | LGBM_Dart_AUC | 0.775 | 0.263 | 0.731 | 0.561 | 9.711 | 0.731 | 0.573 | 0.705 | 0.561 | 0.288 | 0.674 | 0.183 | 0.369 | 0.492 | 0.582 | 0.540 | 204.6 | 21.866262 |
| 3 | LGBM_AUC | 0.773 | 0.261 | 0.738 | 0.564 | 9.448 | 0.738 | 0.573 | 0.702 | 0.564 | 0.289 | 0.660 | 0.185 | 0.369 | 0.487 | 0.573 | 0.531 | 73.7 | 23.586256 |
| 5 | LGBM_AUC_All_Features | 0.773 | 0.263 | 0.741 | 0.566 | 9.343 | 0.741 | 0.574 | 0.703 | 0.566 | 0.291 | 0.658 | 0.187 | 0.370 | 0.488 | 0.573 | 0.526 | 57.0 | 39.724908 |
| 1 | LGBM_AUC_Base_Features | 0.759 | 0.243 | 0.715 | 0.547 | 10.278 | 0.715 | 0.566 | 0.690 | 0.547 | 0.272 | 0.661 | 0.171 | 0.352 | 0.474 | 0.566 | 0.561 | 20.7 | 21.202541 |
| 2 | LGBM_Weighted_LogLoss | 0.755 | 0.239 | 0.748 | 0.564 | 9.073 | 0.748 | 0.570 | 0.684 | 0.564 | 0.281 | 0.608 | 0.182 | 0.354 | 0.460 | 0.535 | 0.520 | 18.8 | 11.854638 |
| 0 | Baseline_Only_CreditRatings | 0.723 | 0.203 | 0.670 | 0.516 | 11.904 | 0.670 | 0.553 | 0.665 | 0.516 | 0.244 | 0.660 | 0.150 | 0.322 | 0.449 | 0.550 | 0.613 | 2.3 | 2.143046 |
| Model | auc | pr_auc | _f1_micro | _f1_macro | logloss | accuracy | precision_macro | recall_macro | f1_macro | target_f1 | target_recall | target_precision | fbeta_1.5 | fbeta_2.5 | fbeta_4.0 | log_loss | elapsed_time | total_size | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 4 | LGBM_Dart_AUC | 0.790 | 0.282 | 0.730 | 0.564 | 9.745 | 0.730 | 0.577 | 0.718 | 0.564 | 0.296 | 0.705 | 0.188 | 0.381 | 0.511 | 0.607 | 0.541 | 204.6 | 21.866262 |
| 3 | LGBM_AUC | 0.789 | 0.276 | 0.735 | 0.567 | 9.539 | 0.735 | 0.577 | 0.715 | 0.567 | 0.297 | 0.692 | 0.189 | 0.380 | 0.506 | 0.598 | 0.533 | 73.7 | 23.586256 |
| 5 | LGBM_AUC_All_Features | 0.787 | 0.275 | 0.739 | 0.568 | 9.419 | 0.739 | 0.577 | 0.713 | 0.568 | 0.296 | 0.682 | 0.189 | 0.379 | 0.502 | 0.592 | 0.528 | 57.0 | 39.724908 |
| 1 | LGBM_AUC_Base_Features | 0.768 | 0.246 | 0.713 | 0.548 | 10.328 | 0.713 | 0.567 | 0.695 | 0.548 | 0.275 | 0.672 | 0.173 | 0.356 | 0.481 | 0.575 | 0.562 | 20.7 | 21.202541 |
| 2 | LGBM_Weighted_LogLoss | 0.763 | 0.240 | 0.733 | 0.558 | 9.608 | 0.733 | 0.569 | 0.691 | 0.558 | 0.280 | 0.641 | 0.179 | 0.357 | 0.473 | 0.557 | 0.541 | 18.8 | 11.854638 |
| 0 | Baseline_Only_CreditRatings | 0.729 | 0.199 | 0.677 | 0.521 | 11.632 | 0.677 | 0.554 | 0.666 | 0.521 | 0.246 | 0.654 | 0.152 | 0.324 | 0.449 | 0.547 | 0.613 | 2.3 | 2.143046 |
V:\projects\ppuodz-ML.4.1\shared\graph.py:1057: UserWarning: The figure layout has changed to tight plt.tight_layout()
Full DS size: 307511
--------------------------------------------------------------------------- NameError Traceback (most recent call last) Cell In[10], line 11 8 features_matrix = target_model_config[0].model_pipeline_config.load_data( 9 loader_function=feature_builder_v2.load_datasets_and_prepare_features) 10 # ---> 11 features_all, labels_all = pipeline._get_features_labels(features_matrix) 12 X_train, X_test, y_train, y_test = pipeline.get_deterministic_train_test_split( 13 features_all, labels_all 14 ) 16 X_train = X_train.drop(columns=["TARGET"]) NameError: name 'pipeline' is not defined
The calibration curve, also known as a reliability diagram, is a graphical representation used to evaluate the accuracy of predicted probabilities in classification models. It specifically checks how well the predicted probabilities of a model match the actual outcomes.
The ideal calibration curve is a straight line at a 45-degree angle from the bottom left to the top right of the plot. This line, often called the "line of perfect calibration," indicates that the model's predictions are perfectly calibrated. If your model predicts a class with 70% probability, then 70% of the cases that are predicted as such should indeed belong to that class.
Common Patterns and Their Interpretations: Perfect Calibration:
The points lie on the diagonal line from (0,0) to (1,1). Example: If a model predicts an event with 30% probability, then in the long run, that event occurs about 30% of the time when predicted at this probability. Underconfidence:
The curve lies above the diagonal line. The model's probabilities are lower than the true likelihood of the event. For instance, if events the model predicts to happen 60% of the time actually happen 80% of the time, the model is underconfident. Overconfidence:
The curve lies below the diagonal line. The model predicts higher probabilities than what is true. If a prediction of 80% only happens 60% of the time, the model is overconfident.
Expected Calibration Error (ECE): This measures the average difference between the predicted probabilities and the actual outcomes. Lower values indicate better calibration.
Brier Score: Measures the mean squared difference between the predicted probability and the actual outcome. It is a good measure of the accuracy and calibration of the predictions.
Residual Plots show the difference between observed and predicted probabilities. Helps in checking the assumption of homoscedasticity. Ideally, residuals should be randomly dispersed around the central line, and patterns suggest model inadequacies.
homoscedasticity
Based on the chart we see these potential problems: Probability Estimates are Polarized:
The cup-like pattern at the top suggests the model is very confident (probabilities close to 0 or 1) about certain instances but is incorrect, as these points have higher residuals. The bottom lines being straighter and closer to zero indicate that for a range of predicted probabilities, the residuals are consistently low, which means the model performs well in that range.
Model Overconfidence:
The residuals are larger for predictions near 0 or 1 because the log loss penalizes confident incorrect predictions more harshly than less confident ones. This overconfidence is often characteristic of models that are not well-calibrated and could benefit from probability calibration techniques.
Class Imbalance:
This pattern can sometimes emerge from class imbalance if the model is better at predicting the majority class and frequently mispredicts the minority class with high confidence.
Non-linearity in Feature Space:
The curving pattern could also be a sign that the model is not capturing some non-linear relationships between features and the outcome. This might suggest that feature engineering or a more sophisticated model could be helpful.
Mean Absolute Error (MAE) of Residuals: This is the average of the absolute values of the residuals. It gives an idea of the average magnitude of the prediction errors. Annotation: Indicating MAE on the residual plot can help assess the typical error magnitude in a more intuitive way than just viewing the spread of residuals.
Maximum Residual: The maximum value among the residuals can indicate the worst-case scenario for your predictions. Annotation: Marking the maximum residual can alert users to the worst errors the model could make.
In the finance industry, loan grades (or credit scores) are a crucial part of risk management, helping lenders assess the creditworthiness of borrowers. These grades are typically determined based on various factors, including the borrower's credit history, income stability, debt-to-income ratio, and more. The grades reflect the estimated risk of default, and they directly influence the interest rate offered to the borrower. Commonly, loan grades are categorized from 'A' (lowest risk) to 'G' (highest risk), although the specific categories can vary by institution.
A (Lowest Risk): Below 1% default rate. Borrowers with excellent credit histories and very low risk of default. B: 1% to 3% default rate. Borrowers with good credit histories and low risk of default. C: 3% to 7% default rate. Borrowers with average credit histories and moderate risk of default. D: 7% to 15% default rate. Borrowers with below-average credit histories and higher risk of default. E: 15% to 25% default rate. Borrowers with poor credit histories and very high risk of default. F and G (Highest Risk): Above 25% default rate. Borrowers with very poor credit histories and extremely high risk of default.
Full DS size: 307511
| Total NaN Values | Proportion NaN (%) | |
|---|---|---|
| PrevRatioRejectedAccepted | 16847 | 5.0 |
This notebooks includes the analysis of selected variables (based on their importance at predicting the target variable) and their relationships. Individual analysis of each variable is available in the EDA_appendices notebook.
NaN Values by Column:
| Total NaN Values | Proportion NaN (%) | |
|---|---|---|
| ExtSource2 | 660 | 0.0 |
| ExtSource3 | 60965 | 20.0 |
| ExtSource1 | 173378 | 56.0 |
| AmtGoodsPrice | 278 | 0.0 |
| OwnCarAge | 202929 | 66.0 |
| PrevAmtDownPaymentSum | 16454 | 5.0 |
| AmtAnnuity | 12 | 0.0 |
| MeanbureaudaysCredit | 44020 | 14.0 |
| MeanbureauamtCreditSumDebt | 51380 | 17.0 |
| PrevAvgYieldGroup | 18945 | 6.0 |
| PrevCreditReceivedRequestedDiff | 16454 | 5.0 |
| OccupationType | 96391 | 31.0 |
| PrevRatioRejectedAccepted | 16847 | 5.0 |
| MaxbureaudaysCreditEnddate | 46269 | 15.0 |
| PrevLastLoanGoodsCategory | 16454 | 5.0 |
| MeanbureauamtCreditMaxOverdue | 123625 | 40.0 |
'Duplicated Values: 0'
'Total Columns: 229'
Because we has such a large number of columns we have only included features whhich have an importance value { > X } with our final LGBM model: TODO
V:\projects\ppuodz-ML.4.1\shared\graph.py:1276: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead. corr = round(corr.applymap(pd.to_numeric), 2)
The TARGET variable (loans with payment difficulties) is most correlated with credit ratings obtained from external sources. The correlation is very weak but still significant.
`` Because the datatypes of features vary we had to use different methods to measure the strength and significance of each pair:
Chi-Squared Test: Assesses independence between two categorical variables. For bool-bool pairs due to categorical nature.
Point Biserial Correlation: Measures correlation between a binary and a continuous variable. For bool-numerical pairs to account for mixed data types.
Spearman's Rank Correlation: Assesses monotonic relationship between two continuous variables. Used for numerical-numerical pairs (for non-normally distributed data).
Since the Chi-Squared test outputs an unbound statistic/value which can't be directly compared to pointbiserialr or Spearman Rank we have converted them to a Cramér's V: value which is normalized between 0 and 1. This was done to make the values in the matrix more uniform however we must note that Cramér's V and Spearman's correlation coefficients are fundamentally different statistics and generally can't be directly compared.
Our target variable TARGET show whether the given application had any late payments (value = 1), we can see that no single feature is strongly correlated with it:
| Coefficient | P-Value | |
|---|---|---|
| Column | ||
| ExtSource3 | -0.161 | 0.000 |
| ExtSource1 | -0.131 | 0.000 |
| ExtSource2 | -0.128 | 0.000 |
| MeanbureaudaysCredit | 0.093 | 0.000 |
| OccupationType | 0.075 | 0.000 |
| DaysEmployed | 0.074 | 0.000 |
| PrevRatioRejectedAccepted | 0.073 | 0.000 |
| PrevRatioRejectedAccepted_cats_2 | 0.072 | 0.000 |
| PrevRatioRejectedAccepted_cats | 0.072 | 0.000 |
| OrganizationType | 0.069 | 0.000 |
| NameEducationType | 0.067 | 0.000 |
| PrevAmtDownPaymentSum | -0.057 | 0.000 |
| PrevCreditReceivedRequestedDiff | 0.055 | 0.000 |
| DaysBirth | 0.053 | 0.000 |
| PrevLastLoanGoodsCategory | 0.051 | 0.000 |
| OwnCarAge | 0.050 | 0.000 |
| MeanbureauamtCreditSumDebt | 0.049 | 0.000 |
| MeanbureauamtCreditMaxOverdue | 0.044 | 0.000 |
| DaysIdPublish | 0.042 | 0.000 |
| CodeGender | 0.041 | 0.000 |
| PrevAvgYieldGroup | 0.040 | 0.000 |
| FlagDocument3 | 0.039 | 0.000 |
| AmtGoodsPrice | -0.034 | 0.000 |
| MaxbureaudaysCreditEnddate | 0.034 | 0.000 |
| NameFamilyStatus | 0.027 | 0.002 |
| AmtCredit | -0.023 | 0.001 |
| AmtAnnuity | 0.003 | 0.664 |
The chart below shows the relationship between selected categorical variables and loan status. E.g. a significantly higher proportion of loans taken out by males had issues.
CategoricalDtype(categories=['< 25% Rejected', '> 25% Rejected', 'All Accepted', 'No Previous App.'], ordered=False, categories_dtype=object)
The charts below show pairs of numerical and categorical features (including some binned numerical features) that have a signficant relationships and at least a small effect size (eta_squared>0.01) based on the non-parametric Kruskal-Wallis Test (one-way ANOVA on ranks) testing whether samples originate from the same distribution.
*It's similar to the Mann–Whitney U test but allows comparing more than 2 groups
V:\projects\ppuodz-ML.4.1\shared\graph.py:1477: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. grouped = _df.groupby(c)[y_target] V:\projects\ppuodz-ML.4.1\shared\graph.py:1490: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. group_counts = _df.groupby(c).size()
V:\projects\ppuodz-ML.4.1\shared\graph.py:1477: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. grouped = _df.groupby(c)[y_target] V:\projects\ppuodz-ML.4.1\shared\graph.py:1490: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. group_counts = _df.groupby(c).size()
V:\projects\ppuodz-ML.4.1\shared\graph.py:1477: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. grouped = _df.groupby(c)[y_target] V:\projects\ppuodz-ML.4.1\shared\graph.py:1490: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. group_counts = _df.groupby(c).size()
V:\projects\ppuodz-ML.4.1\shared\graph.py:1477: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. grouped = _df.groupby(c)[y_target] V:\projects\ppuodz-ML.4.1\shared\graph.py:1490: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. group_counts = _df.groupby(c).size()
V:\projects\ppuodz-ML.4.1\shared\graph.py:1477: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. grouped = _df.groupby(c)[y_target] V:\projects\ppuodz-ML.4.1\shared\graph.py:1490: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. group_counts = _df.groupby(c).size()
V:\projects\ppuodz-ML.4.1\shared\graph.py:1477: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. grouped = _df.groupby(c)[y_target] V:\projects\ppuodz-ML.4.1\shared\graph.py:1490: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. group_counts = _df.groupby(c).size()
V:\projects\ppuodz-ML.4.1\shared\graph.py:1477: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. grouped = _df.groupby(c)[y_target] V:\projects\ppuodz-ML.4.1\shared\graph.py:1490: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. group_counts = _df.groupby(c).size()
V:\projects\ppuodz-ML.4.1\shared\graph.py:1477: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. grouped = _df.groupby(c)[y_target]
V:\projects\ppuodz-ML.4.1\shared\graph.py:1477: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. grouped = _df.groupby(c)[y_target] V:\projects\ppuodz-ML.4.1\shared\graph.py:1490: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. group_counts = _df.groupby(c).size()
V:\projects\ppuodz-ML.4.1\shared\graph.py:1477: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. grouped = _df.groupby(c)[y_target] V:\projects\ppuodz-ML.4.1\shared\graph.py:1490: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. group_counts = _df.groupby(c).size()
V:\projects\ppuodz-ML.4.1\shared\graph.py:1477: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. grouped = _df.groupby(c)[y_target] V:\projects\ppuodz-ML.4.1\shared\graph.py:1490: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. group_counts = _df.groupby(c).size()
V:\projects\ppuodz-ML.4.1\shared\graph.py:1477: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. grouped = _df.groupby(c)[y_target] V:\projects\ppuodz-ML.4.1\shared\graph.py:1490: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. group_counts = _df.groupby(c).size()
V:\projects\ppuodz-ML.4.1\shared\graph.py:1477: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. grouped = _df.groupby(c)[y_target] V:\projects\ppuodz-ML.4.1\shared\graph.py:1490: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. group_counts = _df.groupby(c).size()
V:\projects\ppuodz-ML.4.1\shared\graph.py:1477: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. grouped = _df.groupby(c)[y_target] V:\projects\ppuodz-ML.4.1\shared\graph.py:1490: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. group_counts = _df.groupby(c).size()
V:\projects\ppuodz-ML.4.1\shared\graph.py:1477: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. grouped = _df.groupby(c)[y_target] V:\projects\ppuodz-ML.4.1\shared\graph.py:1490: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. group_counts = _df.groupby(c).size()
V:\projects\ppuodz-ML.4.1\shared\graph.py:1477: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. grouped = _df.groupby(c)[y_target] V:\projects\ppuodz-ML.4.1\shared\graph.py:1490: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. group_counts = _df.groupby(c).size()
V:\projects\ppuodz-ML.4.1\shared\graph.py:1477: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. grouped = _df.groupby(c)[y_target] V:\projects\ppuodz-ML.4.1\shared\graph.py:1490: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. group_counts = _df.groupby(c).size()
V:\projects\ppuodz-ML.4.1\shared\graph.py:1477: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. grouped = _df.groupby(c)[y_target] V:\projects\ppuodz-ML.4.1\shared\graph.py:1490: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. group_counts = _df.groupby(c).size()
V:\projects\ppuodz-ML.4.1\shared\graph.py:1477: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. grouped = _df.groupby(c)[y_target] V:\projects\ppuodz-ML.4.1\shared\graph.py:1490: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. group_counts = _df.groupby(c).size()
V:\projects\ppuodz-ML.4.1\shared\graph.py:1477: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. grouped = _df.groupby(c)[y_target] V:\projects\ppuodz-ML.4.1\shared\graph.py:1490: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. group_counts = _df.groupby(c).size()
V:\projects\ppuodz-ML.4.1\shared\graph.py:1477: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. grouped = _df.groupby(c)[y_target] V:\projects\ppuodz-ML.4.1\shared\graph.py:1490: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. group_counts = _df.groupby(c).size()
V:\projects\ppuodz-ML.4.1\shared\graph.py:1477: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. grouped = _df.groupby(c)[y_target] V:\projects\ppuodz-ML.4.1\shared\graph.py:1490: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. group_counts = _df.groupby(c).size()
V:\projects\ppuodz-ML.4.1\shared\graph.py:1477: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. grouped = _df.groupby(c)[y_target] V:\projects\ppuodz-ML.4.1\shared\graph.py:1490: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. group_counts = _df.groupby(c).size()
V:\projects\ppuodz-ML.4.1\shared\graph.py:1477: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. grouped = _df.groupby(c)[y_target] V:\projects\ppuodz-ML.4.1\shared\graph.py:1490: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. group_counts = _df.groupby(c).size()
V:\projects\ppuodz-ML.4.1\shared\graph.py:1477: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. grouped = _df.groupby(c)[y_target] V:\projects\ppuodz-ML.4.1\shared\graph.py:1490: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. group_counts = _df.groupby(c).size()
V:\projects\ppuodz-ML.4.1\shared\graph.py:1477: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. grouped = _df.groupby(c)[y_target] V:\projects\ppuodz-ML.4.1\shared\graph.py:1490: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. group_counts = _df.groupby(c).size()
ExtSource1)¶ExtSource1/2/3 are the variables most strongly correlated with the target variable, they indicate client credit scores obtained from external sources. While th correlation coeficients are very low (only slightly above 0.1) we'll look a bit more into these scores because ussually credit ratings tend be the most useful metric when estimating the risk of specific loans:
C:\Users\Paulius\AppData\Local\Temp\ipykernel_29624\2151574185.py:16: MatplotlibDeprecationWarning: The get_cmap function was deprecated in Matplotlib 3.7 and will be removed two minor releases later. Use ``matplotlib.colormaps[name]`` or ``matplotlib.colormaps.get_cmap(obj)`` instead.
colors = plt.cm.get_cmap('tab10', 4)
Summary for combined model:
Logit Regression Results
==============================================================================
Dep. Variable: TARGET No. Observations: 109589
Model: Logit Df Residuals: 109585
Method: MLE Df Model: 3
Date: Mon, 29 Apr 2024 Pseudo R-squ.: 0.1047
Time: 19:57:19 Log-Likelihood: -25636.
converged: True LL-Null: -28634.
Covariance Type: nonrobust LLR p-value: 0.000
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
const 0.6002 0.040 14.829 0.000 0.521 0.680
ExtSource1 -2.0989 0.061 -34.382 0.000 -2.219 -1.979
ExtSource2 -1.9640 0.060 -32.654 0.000 -2.082 -1.846
ExtSource3 -2.7793 0.062 -44.483 0.000 -2.902 -2.657
==============================================================================
This is a simple Logistic model that just uses the credit scores to estimate the target variable. The confidence interval shows the the standard deviation of the residuals from a combined logistic regression model (residuals in this context are the differences between the observed values (y_combined) and the predicted probabilities).
Gennerally the explained variabity (Pseudo R-squared) is very quite low at only 0.1047 however the model itself is statistically significant (LLR p-value = 0.0)
| Coefficient | Standard Error | P-Value | Conf. Interval Lower | Conf. Interval Upper | |
|---|---|---|---|---|---|
| const | 0.600 | 0.040 | 0.0 | 0.521 | 0.680 |
| ExtSource1 | -2.099 | 0.061 | 0.0 | -2.219 | -1.979 |
| ExtSource2 | -1.964 | 0.060 | 0.0 | -2.082 | -1.846 |
| ExtSource3 | -2.779 | 0.062 | 0.0 | -2.902 | -2.657 |
Normalized credit ratings from three sources are inversely related to default risk, with ExtSource3 having the strongest influence. We can see that a basic Logistic model can already provide a reasonably high result (AUC = 0.74). However, we have to note that the results are based on the full training set and are only provided for EDA/feature analysis purposes. Full statistical modelling will be done in further sections.
C:\Users\Paulius\AppData\Local\Temp\ipykernel_29624\1233466688.py:6: FutureWarning:
`shade` is now deprecated in favor of `fill`; setting `fill=True`.
This will become an error in seaborn v0.14.0; please update your code.
sns.kdeplot(data=features_matrix[features_matrix['TARGET'] == 1][col],
C:\Users\Paulius\AppData\Local\Temp\ipykernel_29624\1233466688.py:8: FutureWarning:
`shade` is now deprecated in favor of `fill`; setting `fill=True`.
This will become an error in seaborn v0.14.0; please update your code.
sns.kdeplot(data=features_matrix[features_matrix['TARGET'] == 0][col], label=f'{col} - No Default', shade=True)
C:\Users\Paulius\AppData\Local\Temp\ipykernel_29624\1233466688.py:6: FutureWarning:
`shade` is now deprecated in favor of `fill`; setting `fill=True`.
This will become an error in seaborn v0.14.0; please update your code.
sns.kdeplot(data=features_matrix[features_matrix['TARGET'] == 1][col],
C:\Users\Paulius\AppData\Local\Temp\ipykernel_29624\1233466688.py:8: FutureWarning:
`shade` is now deprecated in favor of `fill`; setting `fill=True`.
This will become an error in seaborn v0.14.0; please update your code.
sns.kdeplot(data=features_matrix[features_matrix['TARGET'] == 0][col], label=f'{col} - No Default', shade=True)
C:\Users\Paulius\AppData\Local\Temp\ipykernel_29624\1233466688.py:6: FutureWarning:
`shade` is now deprecated in favor of `fill`; setting `fill=True`.
This will become an error in seaborn v0.14.0; please update your code.
sns.kdeplot(data=features_matrix[features_matrix['TARGET'] == 1][col],
C:\Users\Paulius\AppData\Local\Temp\ipykernel_29624\1233466688.py:8: FutureWarning:
`shade` is now deprecated in favor of `fill`; setting `fill=True`.
This will become an error in seaborn v0.14.0; please update your code.
sns.kdeplot(data=features_matrix[features_matrix['TARGET'] == 0][col], label=f'{col} - No Default', shade=True)
We can see that while the the external credit are clearly related to default risk their explanatory power is somewhat limited because there is still a large amount of overlap (especially for ExtSource2, however it's coeefficient in our logistical model is similar to that of ExtSource1.
Did any clients had previously applied for loans with Home Credit and what were the outcomes of their applications?
PrevRatioRejectedAccepted_cats All Accepted 190370 > 25% Rejected 66215 < 25% Rejected 34079 No Previous App. 16847 Name: count, dtype: int64
Did any applicants default on any previous loans?
TotalDefaults_cats No Defaults 304114 1 Defaulted Loans 3397 Name: count, dtype: int64
Suprisingly we can see that a ~1% of all applicants who were granted a loans have previously had payment difficulties with a previous loans at Home Credit. This is quite interesting considering that gennerally credit instituions are reluctant to offer loans again to problematic clients.
Total "Defaults"/Loans With Payment Difficulties per applicant:
| TotalDefaults | count | proportion | |
|---|---|---|---|
| 0 | 0.0 | 304114 | 0.99 |
| 1 | 1.0 | 3177 | 0.01 |
| 2 | 2.0 | 163 | 0.00 |
| 3 | 3.0 | 38 | 0.00 |
| 4 | 4.0 | 11 | 0.00 |
| 5 | 5.0 | 4 | 0.00 |
| 6 | 6.0 | 3 | 0.00 |
| 7 | 7.0 | 1 | 0.00 |
The chart below shows the default rate based on whether applicant has previous applied for loans with Home Cred:
No Previous App. - no previous applications for client found (i.e. new clients)
All Accepted - all previous applications were accepted
< 25% Rejected - less than 1/4 applications were rejected
> 25% Rejected - more than 1/4 applications were rejected
Interestingly we can see that while applicants whose previous loans were rejected are significantly more likely to default when finally given a loan previous clients with no failed applications have a higher default risk than new clients.
This likely limits the usefulness of the previous_application table because only a small proportion of clients have previously rejected applications
<Figure size 1000x600 with 0 Axes>
C:\Users\Paulius\AppData\Local\Temp\ipykernel_29624\4178975199.py:5: FutureWarning:
`shade` is now deprecated in favor of `fill`; setting `fill=True`.
This will become an error in seaborn v0.14.0; please update your code.
sns.kdeplot(data=features_matrix[features_matrix['AnyPreviousRejections'] == 1][col],
C:\Users\Paulius\AppData\Local\Temp\ipykernel_29624\4178975199.py:7: FutureWarning:
`shade` is now deprecated in favor of `fill`; setting `fill=True`.
This will become an error in seaborn v0.14.0; please update your code.
sns.kdeplot(data=features_matrix[features_matrix['AnyPreviousRejections'] == 0][col], label=f'{col} - No Rejections', shade=True)
C:\Users\Paulius\AppData\Local\Temp\ipykernel_29624\3528939979.py:10: FutureWarning:
`shade` is now deprecated in favor of `fill`; setting `fill=True`.
This will become an error in seaborn v0.14.0; please update your code.
sns.kdeplot(data=features_matrix[features_matrix['AnyPreviousDefaults'] == 1][col],
C:\Users\Paulius\AppData\Local\Temp\ipykernel_29624\3528939979.py:12: FutureWarning:
`shade` is now deprecated in favor of `fill`; setting `fill=True`.
This will become an error in seaborn v0.14.0; please update your code.
sns.kdeplot(data=features_matrix[features_matrix['AnyPreviousDefaults'] == 0][col], label=f'{col} - No Rejections', shade=True)
<Figure size 1200x600 with 0 Axes>
C:\Users\Paulius\AppData\Local\Temp\ipykernel_29624\3528939979.py:10: FutureWarning:
`shade` is now deprecated in favor of `fill`; setting `fill=True`.
This will become an error in seaborn v0.14.0; please update your code.
sns.kdeplot(data=features_matrix[features_matrix['AnyPreviousDefaults'] == 1][col],
C:\Users\Paulius\AppData\Local\Temp\ipykernel_29624\3528939979.py:12: FutureWarning:
`shade` is now deprecated in favor of `fill`; setting `fill=True`.
This will become an error in seaborn v0.14.0; please update your code.
sns.kdeplot(data=features_matrix[features_matrix['AnyPreviousDefaults'] == 0][col], label=f'{col} - No Rejections', shade=True)
C:\Users\Paulius\AppData\Local\Temp\ipykernel_29624\3528939979.py:10: FutureWarning:
`shade` is now deprecated in favor of `fill`; setting `fill=True`.
This will become an error in seaborn v0.14.0; please update your code.
sns.kdeplot(data=features_matrix[features_matrix['AnyPreviousDefaults'] == 1][col],
C:\Users\Paulius\AppData\Local\Temp\ipykernel_29624\3528939979.py:12: FutureWarning:
`shade` is now deprecated in favor of `fill`; setting `fill=True`.
This will become an error in seaborn v0.14.0; please update your code.
sns.kdeplot(data=features_matrix[features_matrix['AnyPreviousDefaults'] == 0][col], label=f'{col} - No Rejections', shade=True)
We can clearly see that clients who had run into payment issues with their past loans tend to have a signficantly lower credit ExtSource3 however there is almost no difference with other scores. This incidates that the data from Home Credit itself is only included in the third rating (which might explain its higher explantatory power in our Logistic model)
AmtIncomeTotal AmtCredit AmtAnnuity AmtGoodsPrice AmtReqCreditBureauHour AmtReqCreditBureauDay AmtReqCreditBureauWeek AmtReqCreditBureauMon AmtReqCreditBureauQrt AmtReqCreditBureauYear MaxbureauamtAnnuity MaxbureauamtCreditMaxOverdue MaxbureauamtCreditSum MaxbureauamtCreditSumDebt MaxbureauamtCreditSumLimit MaxbureauamtCreditSumOverdue MeanbureauamtAnnuity MeanbureauamtCreditMaxOverdue MeanbureauamtCreditSum MeanbureauamtCreditSumDebt MeanbureauamtCreditSumLimit MeanbureauamtCreditSumOverdue MinbureauamtAnnuity MinbureauamtCreditMaxOverdue MinbureauamtCreditSum MinbureauamtCreditSumDebt MinbureauamtCreditSumLimit MinbureauamtCreditSumOverdue PrevAmtApplicationMean PrevAmtApplicationSum PrevAmtCreditMean PrevAmtCreditSum PrevAmtDownPaymentSum
V:\projects\ppuodz-ML.4.1\shared\graph.py:1529: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
C:\Users\Paulius\AppData\Local\Temp\ipykernel_29624\4000396085.py:4: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
prop_df = features_matrix.groupby('NameContractType')['TARGET'].value_counts(normalize=True).unstack().fillna(0)
<Figure size 1000x600 with 0 Axes>
The EDA was performed in paralel with performing feature enginerring (aggregation of non-main tables) and building an initial LGBM model (using all features), to minimize unnecessary complexity only features which have some importance { > X } are included.
index SK_BUREAU_ID not found in dataframe, creating new integer column
Appending previous history
'AMT_CREDIT'
| TARGET | NameContractType | CodeGender | FlagOwnCar | FlagOwnRealty | CntChildren | AmtIncomeTotal | AmtCredit | AmtAnnuity | AmtGoodsPrice | NameTypeSuite | NameIncomeType | NameEducationType | NameFamilyStatus | NameHousingType | RegionPopulationRelative | DaysBirth | DaysEmployed | DaysRegistration | DaysIdPublish | OwnCarAge | FlagMobil | FlagEmpPhone | FlagWorkPhone | FlagContMobile | FlagPhone | FlagEmail | OccupationType | CntFamMembers | RegionRatingClient | RegionRatingClientWCity | WeekdayApprProcessStart | HourApprProcessStart | RegRegionNotLiveRegion | RegRegionNotWorkRegion | LiveRegionNotWorkRegion | RegCityNotLiveCity | RegCityNotWorkCity | LiveCityNotWorkCity | OrganizationType | ExtSource1 | ExtSource2 | ExtSource3 | ApartmentsAvg | BasementareaAvg | YearsBeginexpluatationAvg | YearsBuildAvg | CommonareaAvg | ElevatorsAvg | EntrancesAvg | FloorsmaxAvg | FloorsminAvg | LandareaAvg | LivingapartmentsAvg | LivingareaAvg | NonlivingapartmentsAvg | NonlivingareaAvg | ApartmentsMode | BasementareaMode | YearsBeginexpluatationMode | YearsBuildMode | CommonareaMode | ElevatorsMode | EntrancesMode | FloorsmaxMode | FloorsminMode | LandareaMode | LivingapartmentsMode | LivingareaMode | NonlivingapartmentsMode | NonlivingareaMode | ApartmentsMedi | BasementareaMedi | YearsBeginexpluatationMedi | YearsBuildMedi | CommonareaMedi | ElevatorsMedi | EntrancesMedi | FloorsmaxMedi | FloorsminMedi | LandareaMedi | LivingapartmentsMedi | LivingareaMedi | NonlivingapartmentsMedi | NonlivingareaMedi | FondkapremontMode | HousetypeMode | TotalareaMode | WallsmaterialMode | EmergencystateMode | Obs30CntSocialCircle | Def30CntSocialCircle | Obs60CntSocialCircle | Def60CntSocialCircle | DaysLastPhoneChange | FlagDocument2 | FlagDocument3 | FlagDocument4 | FlagDocument5 | FlagDocument6 | ... | MinbureauamtCreditSum | MinbureauamtCreditSumDebt | MinbureauamtCreditSumLimit | MinbureauamtCreditSumOverdue | MinbureaucntCreditProlong | MinbureaucreditDayOverdue | MinbureaudaysCredit | MinbureaudaysCreditEnddate | MinbureaudaysCreditUpdate | MinbureaudaysEnddateFact | MinbureauskIdBureau | ActiveLoansCount | TotalDefaults | DefaultRatio | LastLoanIssuedDays | PrevAmtApplicationMean | PrevAmtApplicationSum | PrevAmtCreditMean | PrevAmtCreditSum | PrevAmtDownPaymentSum | PrevRateInterestPrimaryMean | PrevRateInterestPrimaryStd | PrevTotalDpdSum | PrevHasAnyDpdMean | PrevHasAnyDpdSum | PrevMonthsWithDpdPropMean | PrevMonthsWithDpdPropSum | PrevTotalPreviousLoans | PrevCreditReceivedRequestedDiff | PrevRatioSumDownPaymentCredit | PrevLastLoanInterestRate | PrevLastLoanPurpose | PrevLastLoanContractStatus | PrevLastLoanDecisionDate | PrevLastLoanPaymentType | PrevLastLoanCodeRejectReason | PrevLastLoanClientType | PrevLastLoanPortfolio | PrevLastLoanGoodsCategory | PrevLastLoanProductType | PrevLastLoanYieldGroup | PrevContractStatusApprovedCount | PrevContractStatusCanceledCount | PrevContractStatusRefusedCount | PrevContractStatusUnusedofferCount | PrevPortfolioCardsCount | PrevPortfolioCarsCount | PrevPortfolioCashCount | PrevPortfolioPosCount | PrevPortfolioXnaCount | PrevProductTypeXnaCount | PrevProductTypeWalkinCount | PrevProductTypeXsellCount | PrevAvgYieldGroup | PrevDaysAfterFirstApplication | PrevCurrentlyActiveLoans | PrevApprovedLoans | PrevCanceledLoans | PrevRefusedLoans | PrevUnusedofferLoans | PrevTotalLoans | PrevAcceptedToTotalRatio | PrevCancelledToTotalRatio | PrevRefusedToTotalRatio | PrevUnusedToTotalRatio | PrevRatioRejectedAccepted | PrevCodeRejectReasonClientCount | PrevCodeRejectReasonHcCount | PrevCodeRejectReasonLimitCount | PrevCodeRejectReasonScoCount | PrevCodeRejectReasonScofrCount | PrevCodeRejectReasonSystemCount | PrevCodeRejectReasonVerifCount | PrevCodeRejectReasonXapCount | PrevCodeRejectReasonXnaCount | PrevLastLoanNflagInsuredOnApproval | PrevAvgNflagInsuredOnApproval | ExtSource2_binned | ExtSource3_binned | ExtSource1_binned | DaysEmployed_binned | AmtCredit_binned | OwnCarAge_binned | PrevCreditReceivedRequestedDiff_binned | DaysBirth_binned | PrevAmtDownPaymentSum_binned | MeanbureauamtCreditSumDebt_binned | MaxbureaudaysCreditEnddate_binned | PrevAvgYieldGroup_binned | MeanbureauamtCreditMaxOverdue_binned | AmtGoodsPrice_binned | DaysLastPhoneChange_binned | PrevRatioRejectedAccepted_binned | DaysIdPublish_binned | AmtAnnuity_binned | AmtIncomeTotal_binned | FlagDocument3_binned | MaxbureaudaysCredit_binned | PrevAcceptedToTotalRatio_binned | MaxbureaudaysEnddateFact_binned | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | Cash loans | M | True | False | 2 | 207000.0 | 465457.5 | 52641.0 | 418500.0 | Unaccompanied | Commercial associate | Secondary / secondary special | Married | House / apartment | 0.009630 | -13297 | -762 | -637.0 | -4307 | 19 | 1 | 1 | 0 | 1 | 0 | 0 | Sales staff | 4 | 2 | 2 | THURSDAY | 11 | 0 | 0 | 0 | 0 | 1 | 1 | Business Entity Type 3 | 0.675878 | 0.604894 | 0.000527 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | <NA> | 0 | 0 | 0 | 0 | -2.0 | 0 | 1 | 0 | 0 | 0 | ... | 19449.0 | 0.00 | 0.0 | 0.0 | 0.0 | 0.0 | -2857.0 | -2710.0 | -1033.0 | -1036.0 | 5022129.0 | 3.0 | 0.0 | 0.0 | -225 | 22279.500000 | 111397.50 | 100030.500000 | 500152.50 | 6642.0 | NaN | NaN | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 8.0 | 388755.0 | 0.013280 | NaN | XNA | Refused | -1291.0 | Cash through the bank | HC | Repeater | Cash | XNA | walk-in | high | 2.0 | 0.0 | 3.0 | 0.0 | 2.0 | 0.0 | 1.0 | 2.0 | 0.0 | 2.0 | 3.0 | 0.0 | 1.666667 | -546.0 | NaN | 2.0 | 0.0 | 3.0 | 0.0 | 5.0 | 0.400000 | 0.000000 | 0.600000 | 0.000000 | 1.000000 | 0.0 | 3.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 2.0 | 0.0 | NaN | 0.000000 | 0.57 - 0.66 | 0.0 - 0.37 | 0.68 - 0.96 | -1211.0 - -289.0 | 270000.0 - 512338.5 | 15 - 91 | 117146.25 - 10985769.0 | -15739.0 - -12412.0 | 6111.0 - 17820.0 | 0.0 - 43926.19 | 120.0 - 906.0 | 1.33 - 2.0 | 0.0 - 2076.99 | 238500.0 - 450000.0 | -279.0 - 0.0 | 0.25 - 13.6 | -6551.0 - -4305.0 | 34587.0 - 230161.5 | 202500.0 - 117000000.0 | 1 | -143.0 - -1.0 | 0.0 - 0.5 | -347.0 - -153.0 |
| 1 | 0 | Cash loans | F | True | True | 0 | 247500.0 | 1281712.5 | 48946.5 | 1179000.0 | Unaccompanied | Commercial associate | Higher education | Single / not married | House / apartment | 0.006852 | -14778 | -1141 | -1610.0 | -4546 | 11 | 1 | 1 | 0 | 1 | 0 | 1 | Managers | 1 | 3 | 3 | THURSDAY | 10 | 0 | 0 | 0 | 0 | 0 | 0 | Business Entity Type 3 | 0.430827 | 0.425351 | 0.712155 | 0.0753 | 0.0568 | 0.9970 | 0.9592 | 0.1326 | 0.08 | 0.0517 | 0.4167 | 0.2917 | 0.0735 | 0.0601 | 0.0844 | 0.0058 | 0.1118 | 0.0756 | 0.0566 | 0.9940 | 0.9216 | 0.0523 | 0.0806 | 0.0345 | 0.3333 | 0.0417 | 0.0445 | 0.0652 | 0.0857 | 0.0 | 0.0000 | 0.0760 | 0.0568 | 0.9970 | 0.9597 | 0.1335 | 0.08 | 0.0517 | 0.4167 | 0.2917 | 0.0748 | 0.0611 | 0.0859 | 0.0058 | 0.1142 | reg oper account | block of flats | 0.0754 | Monolithic | False | 2 | 0 | 2 | 0 | -1071.0 | 0 | 1 | 0 | 0 | 0 | ... | 190867.5 | 0.00 | 0.0 | 0.0 | 0.0 | 0.0 | -1487.0 | -391.0 | -765.0 | -765.0 | 5977750.0 | 2.0 | 0.0 | 0.0 | -52 | 454259.250000 | 5451111.00 | 488020.875000 | 5856250.50 | 36603.0 | NaN | NaN | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 8.0 | 405139.5 | 0.006250 | NaN | XAP | Refused | -757.0 | XNA | HC | Repeater | Cards | XNA | walk-in | XNA | 10.0 | 0.0 | 2.0 | 0.0 | 2.0 | 0.0 | 7.0 | 3.0 | 0.0 | 3.0 | 3.0 | 6.0 | 1.000000 | -245.0 | NaN | 10.0 | 0.0 | 2.0 | 0.0 | 12.0 | 0.833333 | 0.000000 | 0.166667 | 0.000000 | 0.181818 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 10.0 | 0.0 | NaN | 0.555556 | 0.39 - 0.57 | 0.67 - 0.89 | 0.33 - 0.51 | -1211.0 - -289.0 | 808650.0 - 4050000.0 | 9 - 15 | 117146.25 - 10985769.0 | -15739.0 - -12412.0 | 17820.0 - 3960000.0 | -220213.42 - 0.0 | 1682.0 - 31199.0 | 0.5 - 1.0 | 2076.99 - 47406123.0 | 679500.0 - 4050000.0 | -1571.0 - -762.0 | 0.0 - 0.25 | -6551.0 - -4305.0 | 34587.0 - 230161.5 | 202500.0 - 117000000.0 | 1 | -300.0 - -143.0 | 0.8 - 1.0 | -153.0 - 0.0 |
| 2 | 0 | Cash loans | F | True | False | 0 | 202500.0 | 495000.0 | 39109.5 | 495000.0 | Unaccompanied | Working | Secondary / secondary special | Married | House / apartment | 0.035792 | -17907 | -639 | -2507.0 | -1461 | 4 | 1 | 1 | 1 | 1 | 0 | 0 | Sales staff | 2 | 2 | 2 | TUESDAY | 16 | 0 | 0 | 0 | 0 | 0 | 0 | Self-employed | 0.527239 | 0.531760 | 0.207964 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | <NA> | 5 | 0 | 5 | 0 | -1435.0 | 0 | 1 | 0 | 0 | 0 | ... | 179550.0 | 0.00 | 0.0 | 0.0 | 0.0 | 0.0 | -2305.0 | -2123.0 | -1706.0 | -2123.0 | 5353341.0 | 3.0 | 0.0 | 0.0 | -394 | 121017.818571 | 847124.73 | 126556.547143 | 885895.83 | 12330.9 | NaN | NaN | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 13.0 | 38771.1 | 0.013919 | NaN | XNA | Canceled | -107.0 | XNA | XAP | Repeater | XNA | XNA | XNA | XNA | 3.0 | 2.0 | 1.0 | 1.0 | 0.0 | 0.0 | 1.0 | 2.0 | 4.0 | 6.0 | 0.0 | 1.0 | 1.000000 | -2835.0 | NaN | 3.0 | 2.0 | 1.0 | 1.0 | 7.0 | 0.428571 | 0.285714 | 0.142857 | 0.142857 | 0.250000 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 5.0 | 0.0 | NaN | 0.000000 | 0.39 - 0.57 | 0.0 - 0.37 | 0.51 - 0.68 | -1211.0 - -289.0 | 270000.0 - 512338.5 | 0 - 5 | 16020.0 - 117146.25 | -19681.0 - -15739.0 | 6111.0 - 17820.0 | 141482.43 - 43650000.0 | 906.0 - 1682.0 | 0.5 - 1.0 | NaN | 450000.0 - 679500.0 | -1571.0 - -762.0 | 0.0 - 0.25 | -1728.0 - 0.0 | 34587.0 - 230161.5 | 148500.0 - 202500.0 | 1 | -624.0 - -300.0 | 0.0 - 0.5 | -718.0 - -347.0 |
| 3 | 0 | Cash loans | F | False | True | 0 | 247500.0 | 254700.0 | 24939.0 | 225000.0 | Unaccompanied | State servant | Secondary / secondary special | Widow | House / apartment | 0.046220 | -19626 | -6982 | -11167.0 | -3158 | <NA> | 1 | 1 | 0 | 1 | 0 | 0 | High skill tech staff | 1 | 1 | 1 | FRIDAY | 14 | 0 | 0 | 0 | 0 | 0 | 0 | Business Entity Type 3 | NaN | 0.693521 | 0.614414 | 0.1320 | 0.0645 | 0.9846 | NaN | NaN | 0.16 | 0.0690 | 0.6250 | NaN | NaN | NaN | 0.1628 | NaN | 0.0022 | 0.1345 | 0.0670 | 0.9846 | NaN | NaN | 0.1611 | 0.0690 | 0.6250 | NaN | NaN | NaN | 0.1696 | NaN | 0.0023 | 0.1332 | 0.0645 | 0.9846 | NaN | NaN | 0.16 | 0.0690 | 0.6250 | NaN | NaN | NaN | 0.1657 | NaN | 0.0022 | NaN | NaN | 0.1285 | Panel | False | 0 | 0 | 0 | 0 | -2000.0 | 0 | 1 | 0 | 0 | 0 | ... | 38268.0 | 0.00 | 0.0 | 0.0 | 0.0 | 0.0 | -2657.0 | -2322.0 | -2321.0 | -2321.0 | 5347375.0 | 0.0 | 0.0 | 0.0 | -1234 | 104292.000000 | 104292.00 | 103153.500000 | 103153.50 | 10431.0 | NaN | NaN | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 7.0 | -1138.5 | 0.101121 | NaN | XAP | Approved | -2000.0 | Cash through the bank | XAP | New | POS | Audio/Video | XNA | high | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | 0.0 | 2.000000 | -2599.0 | NaN | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.000000 | 0.66 - 0.85 | 0.54 - 0.67 | NaN | -17912.0 - -2760.25 | 45000.0 - 270000.0 | NaN | -3960000.0 - -558.0 | -19681.0 - -15739.0 | 6111.0 - 17820.0 | -220213.42 - 0.0 | -41875.0 - 120.0 | 1.33 - 2.0 | 2076.99 - 47406123.0 | 45000.0 - 238500.0 | -4131.0 - -1571.0 | 0.0 - 0.25 | -3269.0 - -1728.0 | 24916.5 - 34587.0 | 202500.0 - 117000000.0 | 1 | -2922.0 - -624.0 | 0.8 - 1.0 | -2858.0 - -718.0 |
| 4 | 0 | Cash loans | M | False | True | 0 | 112500.0 | 308133.0 | 15862.5 | 234000.0 | Unaccompanied | Working | Secondary / secondary special | Single / not married | House / apartment | 0.018850 | -20327 | -1105 | -7299.0 | -494 | <NA> | 1 | 1 | 0 | 1 | 0 | 0 | Laborers | 1 | 2 | 2 | WEDNESDAY | 11 | 0 | 0 | 0 | 0 | 0 | 0 | Business Entity Type 3 | 0.654882 | 0.560690 | 0.636376 | 0.0619 | 0.0553 | 0.9717 | NaN | NaN | 0.00 | 0.1724 | 0.1667 | NaN | 0.0866 | NaN | 0.0749 | NaN | 0.0149 | 0.0630 | 0.0574 | 0.9717 | NaN | NaN | 0.0000 | 0.1724 | 0.1667 | NaN | 0.0885 | NaN | 0.0780 | NaN | 0.0158 | 0.0625 | 0.0553 | 0.9717 | NaN | NaN | 0.00 | 0.1724 | 0.1667 | NaN | 0.0881 | NaN | 0.0762 | NaN | 0.0152 | NaN | block of flats | 0.0765 | Stone, brick | False | 0 | 0 | 0 | 0 | -173.0 | 0 | 1 | 0 | 0 | 0 | ... | 28575.0 | 0.00 | 0.0 | 0.0 | 0.0 | 0.0 | -2618.0 | -2430.0 | -2415.0 | -2430.0 | 6293239.0 | 1.0 | 0.0 | 0.0 | -693 | 70180.000000 | 631620.00 | 177702.500000 | 1599322.50 | 2475.0 | NaN | NaN | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 7.0 | 967702.5 | 0.001548 | NaN | XNA | Canceled | -173.0 | XNA | XAP | Repeater | XNA | XNA | XNA | XNA | 5.0 | 1.0 | 3.0 | 0.0 | 2.0 | 0.0 | 4.0 | 2.0 | 1.0 | 3.0 | 1.0 | 5.0 | 1.666667 | -2504.0 | NaN | 5.0 | 1.0 | 3.0 | 0.0 | 9.0 | 0.555556 | 0.111111 | 0.333333 | 0.000000 | 0.500000 | 0.0 | 2.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 6.0 | 0.0 | NaN | 0.600000 | 0.39 - 0.57 | 0.54 - 0.67 | 0.51 - 0.68 | -1211.0 - -289.0 | 270000.0 - 512338.5 | NaN | 117146.25 - 10985769.0 | -25201.0 - -19681.0 | 0.0 - 6111.0 | 141482.43 - 43650000.0 | 906.0 - 1682.0 | 1.33 - 2.0 | NaN | 45000.0 - 238500.0 | -279.0 - 0.0 | 0.25 - 13.6 | -1728.0 - 0.0 | 1980.0 - 16573.5 | 25650.0 - 112500.0 | 1 | -300.0 - -143.0 | 0.5 - 0.8 | -718.0 - -347.0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 99995 | 0 | Revolving loans | F | False | True | 1 | 202500.0 | 585000.0 | 29250.0 | 585000.0 | Unaccompanied | Working | Secondary / secondary special | Married | House / apartment | 0.010147 | -13827 | -1317 | -398.0 | -1172 | <NA> | 1 | 1 | 0 | 1 | 0 | 0 | Sales staff | 3 | 2 | 2 | MONDAY | 9 | 0 | 0 | 0 | 0 | 0 | 0 | Self-employed | 0.678014 | 0.591704 | 0.456110 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | <NA> | 0 | 0 | 0 | 0 | -967.0 | 0 | 0 | 0 | 0 | 0 | ... | 0.0 | 0.00 | 0.0 | 0.0 | 0.0 | 0.0 | -1043.0 | -766.0 | -710.0 | -766.0 | 6624105.0 | 5.0 | 0.0 | 0.0 | -504 | 38610.000000 | 115830.00 | 38770.500000 | 116311.50 | 9220.5 | NaN | NaN | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 7.0 | 481.5 | 0.079274 | NaN | XNA | Approved | -371.0 | XNA | XAP | Repeater | Cash | XNA | x-sell | middle | 3.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 2.0 | 0.0 | 2.0 | 0.0 | 1.0 | 1.000000 | -2887.0 | NaN | 3.0 | 0.0 | 0.0 | 0.0 | 3.0 | 1.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 3.0 | 0.0 | 1.0 | 0.333333 | 0.57 - 0.66 | 0.37 - 0.54 | 0.68 - 0.96 | -2760.25 - -1211.0 | 512338.5 - 808650.0 | NaN | -558.0 - 16020.0 | -15739.0 - -12412.0 | 6111.0 - 17820.0 | 0.0 - 43926.19 | 906.0 - 1682.0 | 0.5 - 1.0 | 0.0 - 2076.99 | 450000.0 - 679500.0 | -1571.0 - -762.0 | 0.0 - 0.25 | -1728.0 - 0.0 | 24916.5 - 34587.0 | 148500.0 - 202500.0 | 0 | -624.0 - -300.0 | 0.8 - 1.0 | -718.0 - -347.0 |
| 99996 | 0 | Cash loans | M | False | True | 0 | 225000.0 | 562500.0 | 31540.5 | 562500.0 | Unaccompanied | Working | Secondary / secondary special | Married | House / apartment | 0.018209 | -20956 | -3053 | -13427.0 | -4280 | <NA> | 1 | 1 | 0 | 1 | 1 | 0 | Drivers | 2 | 3 | 3 | FRIDAY | 12 | 0 | 0 | 0 | 0 | 0 | 0 | Business Entity Type 3 | NaN | 0.140261 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | <NA> | 2 | 1 | 2 | 1 | -842.0 | 0 | 1 | 0 | 0 | 0 | ... | 3127500.0 | NaN | NaN | 0.0 | 0.0 | 0.0 | -1914.0 | 994.0 | -498.0 | -498.0 | 5160093.0 | 0.0 | 0.0 | 0.0 | -498 | 375000.000000 | 1125000.00 | 531804.000000 | 1595412.00 | 0.0 | NaN | NaN | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 7.0 | 470412.0 | 0.000000 | NaN | XAP | Approved | -542.0 | XNA | XAP | Repeater | Cards | XNA | x-sell | XNA | 3.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 2.0 | 0.0 | 0.0 | 0.0 | 1.0 | 2.0 | 1.500000 | -2842.0 | NaN | 3.0 | 0.0 | 0.0 | 0.0 | 3.0 | 1.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 3.0 | 0.0 | 0.0 | 0.333333 | 0.0 - 0.39 | NaN | NaN | -17912.0 - -2760.25 | 512338.5 - 808650.0 | NaN | 117146.25 - 10985769.0 | -25201.0 - -19681.0 | 0.0 - 6111.0 | NaN | 906.0 - 1682.0 | 1.33 - 2.0 | NaN | 450000.0 - 679500.0 | -1571.0 - -762.0 | 0.0 - 0.25 | -4305.0 - -3269.0 | 24916.5 - 34587.0 | 202500.0 - 117000000.0 | 1 | -2922.0 - -624.0 | 0.8 - 1.0 | -718.0 - -347.0 |
| 99997 | 0 | Revolving loans | M | True | False | 1 | 135000.0 | 180000.0 | 9000.0 | 180000.0 | Family | Working | Higher education | Married | House / apartment | 0.035792 | -10578 | -592 | -5307.0 | -3257 | 11 | 1 | 1 | 0 | 1 | 0 | 0 | Core staff | 3 | 2 | 2 | SATURDAY | 14 | 0 | 0 | 0 | 0 | 0 | 0 | Self-employed | 0.602777 | 0.487365 | 0.490258 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | <NA> | 0 | 0 | 0 | 0 | -2521.0 | 0 | 0 | 0 | 0 | 0 | ... | 91593.0 | 49016.43 | 0.0 | 0.0 | 0.0 | 0.0 | -229.0 | 206.0 | -3.0 | NaN | 5967472.0 | 2.0 | 0.0 | 0.0 | 0 | 103455.000000 | 103455.00 | 93109.500000 | 93109.50 | 10345.5 | NaN | NaN | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 10.0 | -10345.5 | 0.111111 | NaN | XAP | Approved | -320.0 | Cash through the bank | XAP | New | POS | Computers | XNA | middle | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.000000 | -2702.0 | 2.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.000000 | 0.39 - 0.57 | 0.37 - 0.54 | 0.51 - 0.68 | -1211.0 - -289.0 | 45000.0 - 270000.0 | 0 - 5 | -3960000.0 - -558.0 | -12412.0 - -7673.0 | 6111.0 - 17820.0 | 141482.43 - 43650000.0 | 906.0 - 1682.0 | 0.5 - 1.0 | 0.0 - 2076.99 | 45000.0 - 238500.0 | -4131.0 - -1571.0 | 0.0 - 0.25 | -3269.0 - -1728.0 | 1980.0 - 16573.5 | 112500.0 - 148500.0 | 0 | -143.0 - -1.0 | 0.8 - 1.0 | NaN |
| 99998 | 0 | Cash loans | F | True | True | 0 | 135000.0 | 254700.0 | 17149.5 | 225000.0 | Unaccompanied | Working | Incomplete higher | Single / not married | Rented apartment | 0.002506 | -8062 | -92 | -6446.0 | -724 | 13 | 1 | 1 | 0 | 1 | 0 | 0 | High skill tech staff | 1 | 2 | 2 | TUESDAY | 7 | 0 | 0 | 0 | 1 | 1 | 0 | Housing | 0.352214 | 0.714284 | 0.651260 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | <NA> | 1 | 0 | 1 | 0 | -707.0 | 0 | 1 | 0 | 0 | 0 | ... | 540000.0 | 0.00 | 0.0 | 0.0 | 0.0 | 0.0 | -473.0 | -106.0 | -100.0 | -106.0 | 5414423.0 | 0.0 | 0.0 | 0.0 | -106 | 396369.000000 | 396369.00 | 380272.500000 | 380272.50 | 39640.5 | NaN | NaN | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 10.0 | -16096.5 | 0.104242 | NaN | XAP | Approved | -707.0 | Cash through the bank | XAP | New | POS | Audio/Video | XNA | low_normal | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.000000 | -1406.0 | NaN | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.000000 | 0.66 - 0.85 | 0.54 - 0.67 | 0.33 - 0.51 | -289.0 - 365243.0 | 45000.0 - 270000.0 | 9 - 15 | -3960000.0 - -558.0 | -12412.0 - -7673.0 | 17820.0 - 3960000.0 | -220213.42 - 0.0 | -41875.0 - 120.0 | 0.0 - 0.5 | 0.0 - 2076.99 | 45000.0 - 238500.0 | -762.0 - -279.0 | 0.0 - 0.25 | -1728.0 - 0.0 | 16573.5 - 24916.5 | 112500.0 - 148500.0 | 1 | -624.0 - -300.0 | 0.8 - 1.0 | -153.0 - 0.0 |
| 99999 | 0 | Cash loans | M | True | True | 2 | 157500.0 | 746280.0 | 59094.0 | 675000.0 | Unaccompanied | Working | Higher education | Married | House / apartment | 0.020246 | -13934 | -1210 | -3758.0 | -5221 | 19 | 1 | 1 | 0 | 1 | 0 | 1 | Drivers | 4 | 3 | 3 | WEDNESDAY | 11 | 0 | 0 | 0 | 0 | 0 | 0 | Self-employed | 0.409607 | 0.366777 | 0.502878 | 0.0062 | NaN | 0.9886 | NaN | NaN | NaN | 0.2069 | 0.0417 | NaN | NaN | NaN | NaN | NaN | NaN | 0.0063 | NaN | 0.9886 | NaN | NaN | NaN | 0.2069 | 0.0417 | NaN | NaN | NaN | NaN | NaN | NaN | 0.0062 | NaN | 0.9886 | NaN | NaN | NaN | 0.2069 | 0.0417 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | block of flats | 0.0162 | Stone, brick | False | 0 | 0 | 0 | 0 | -2452.0 | 0 | 1 | 0 | 0 | 0 | ... | 37800.0 | 0.00 | 0.0 | 0.0 | 0.0 | 0.0 | -2708.0 | -1073.0 | -892.0 | -1073.0 | 6671057.0 | 1.0 | 0.0 | 0.0 | -552 | 48712.500000 | 194850.00 | 48766.500000 | 195066.00 | 4725.0 | NaN | NaN | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 6.0 | 216.0 | 0.024223 | NaN | XAP | Approved | -174.0 | XNA | XAP | Repeater | POS | Auto Accessories | XNA | middle | 3.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 4.0 | 0.0 | 4.0 | 0.0 | 0.0 | 1.250000 | -2237.0 | NaN | 3.0 | 0.0 | 1.0 | 0.0 | 4.0 | 0.750000 | 0.000000 | 0.250000 | 0.000000 | 0.250000 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 3.0 | 0.0 | 0.0 | 0.000000 | 0.0 - 0.39 | 0.37 - 0.54 | 0.33 - 0.51 | -1211.0 - -289.0 | 512338.5 - 808650.0 | 15 - 91 | -558.0 - 16020.0 | -15739.0 - -12412.0 | 0.0 - 6111.0 | 43926.19 - 141482.43 | 120.0 - 906.0 | 1.0 - 1.33 | NaN | 450000.0 - 679500.0 | -4131.0 - -1571.0 | 0.0 - 0.25 | -6551.0 - -4305.0 | 34587.0 - 230161.5 | 148500.0 - 202500.0 | 1 | -2922.0 - -624.0 | 0.5 - 0.8 | -718.0 - -347.0 |
100000 rows × 250 columns
Tight layout not applied. tight_layout cannot make axes width small enough to accommodate all axes decorations
set_ticklabels() should only be used with a fixed number of ticks, i.e. after set_ticks() or using a FixedLocator.
set_ticklabels() should only be used with a fixed number of ticks, i.e. after set_ticks() or using a FixedLocator.
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) /tmp/ipykernel_17429/466980156.py in ?() 1 importlib.reload(graph) 2 ----> 3 graph.summary_df_features(features_matrix_sorted_imp) ~/data/projects/ppuodz-ML.4.1/shared/graph.py in ?(source_df) 1207 plt.show() 1208 1209 except Exception as ex: 1210 plt.close(fig) -> 1211 raise ex ~/data/projects/ppuodz-ML.4.1/shared/graph.py in ?(source_df) 1207 plt.show() 1208 1209 except Exception as ex: 1210 plt.close(fig) -> 1211 raise ex ~/miniconda3/envs/rapids_v2/lib/python3.10/site-packages/scipy/stats/_stats_py.py in ?(a, axis, nan_policy) 1988 if contains_nan and nan_policy == 'omit': 1989 a = ma.masked_invalid(a) 1990 return mstats_basic.normaltest(a, axis) 1991 -> 1992 s, _ = skewtest(a, axis) 1993 k, _ = kurtosistest(a, axis) 1994 k2 = s*s + k*k 1995 ~/miniconda3/envs/rapids_v2/lib/python3.10/site-packages/scipy/stats/_stats_py.py in ?(a, axis, nan_policy, alternative) 1602 1603 if axis is None: 1604 a = np.ravel(a) 1605 axis = 0 -> 1606 b2 = skew(a, axis) 1607 n = a.shape[axis] 1608 if n < 8: 1609 raise ValueError( ~/miniconda3/envs/rapids_v2/lib/python3.10/site-packages/scipy/stats/_axis_nan_policy.py in ?(***failed resolving arguments***) 519 # behavior of those would break backward compatibility. 520 521 if sentinel: 522 samples = _remove_sentinel(samples, paired, sentinel) --> 523 res = hypotest_fun_out(*samples, **kwds) 524 res = result_to_tuple(res) 525 res = _add_reduced_axes(res, reduced_axes, keepdims) 526 return tuple_to_result(*res) ~/miniconda3/envs/rapids_v2/lib/python3.10/site-packages/scipy/stats/_stats_py.py in ?(a, axis, bias, nan_policy) 1190 a = ma.masked_invalid(a) 1191 return mstats_basic.skew(a, axis, bias) 1192 1193 mean = a.mean(axis, keepdims=True) -> 1194 m2 = _moment(a, 2, axis, mean=mean) 1195 m3 = _moment(a, 3, axis, mean=mean) 1196 with np.errstate(all='ignore'): 1197 zero = (m2 <= (np.finfo(m2.dtype).resolution * mean.squeeze(axis))**2) ~/miniconda3/envs/rapids_v2/lib/python3.10/site-packages/scipy/stats/_stats_py.py in ?(a, moment, axis, mean) 1065 n_list.append(current_n) 1066 1067 # Starting point for exponentiation by squares 1068 mean = (a.mean(axis, keepdims=True) if mean is None -> 1069 else np.asarray(mean, dtype=dtype)[()]) 1070 a_zero_mean = a - mean 1071 1072 eps = np.finfo(a_zero_mean.dtype).resolution * 10 TypeError: float() argument must be a string or a real number, not 'NAType'
Clustering before using XGBoost can simplify data and possibly improve model performance by highlighting patterns that XGBoost may overlook. This preprocessing step reduces dimensionality and can enhance model interpretability, but its effectiveness depends on data relevance and feature importance evaluation.
Most suitable the dataset has clear boundaries and a roughly uniform distribution for optimal results. We've been unable to obtain clearly defined cluster when using it and based on the type of the dataset it's probably not the most suitable algorithm.
Is an unsupervised algorithm which is more suitable for datasets with significant noise or irrelevant data points (e.g. data exhibits non-globular or irregularly shaped clusters)
Feature
Additionally, we've included a model selected using EvalML (an auto ML library) and a raw dataset (with Featuretools aggregations etc.) [TODO: include notebook]